Topics for a document are identified using names of categories in a knowledge base. Terms are extracted from document text. The extracted terms are mapped to articles in the knowledge base. The number of terms that are mapped to each article are counted. The number of articles to which the terms are mapped are also counted for each category. The categories that include the articles having the mapped terms are sorted such that the most relevant categories for the document correspond to the categories that include the highest number of articles to which the terms are mapped. The most relevant categories are then identified as the topics for the document.
申请公布号
WO2014204341(A1)
申请公布日期
2014.12.24
申请号
WO2013RU00520
申请日期
2013.06.19
申请人
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
发明人
ULANOV, ALEXANDER VLADIMIROVICH;SIDOROV, ALEXANDER, ALEXANDROVICH