发明名称 DOCUMENT TOPIC IDENTIFICATION
摘要 Topics for a document are identified using names of categories in a knowledge base. Terms are extracted from document text. The extracted terms are mapped to articles in the knowledge base. The number of terms that are mapped to each article are counted. The number of articles to which the terms are mapped are also counted for each category. The categories that include the articles having the mapped terms are sorted such that the most relevant categories for the document correspond to the categories that include the highest number of articles to which the terms are mapped. The most relevant categories are then identified as the topics for the document.
申请公布号 WO2014204341(A1) 申请公布日期 2014.12.24
申请号 WO2013RU00520 申请日期 2013.06.19
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. 发明人 ULANOV, ALEXANDER VLADIMIROVICH;SIDOROV, ALEXANDER, ALEXANDROVICH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址