发明名称 Document processing method and system
摘要 A method and system for filtering a candidate document in a candidate document set are provided. The method includes receiving one or more entity word—topic word pairs and identifying one or more entity words of the candidate document and topic words. The method also includes determining whether to add the candidate document into a filtered document set using the entity words and topic words in the given entity word—topic word pairs and the identified entity words and topic words in the candidate document. The method further includes adding the candidate document into a filtered document set in response to determining that the candidate document should be added into the filtered document set.
申请公布号 US9058383(B2) 申请公布日期 2015.06.16
申请号 US201213608438 申请日期 2012.09.10
申请人 International Business Machines Corporation 发明人 Bao Sheng Hua;Cui Jie;Su Hui;Su Zhong;Zhang Li
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A method for filtering a candidate document in a candidate document set, wherein the candidate document set comprises at least one candidate document, the method comprising: receiving one or more entity word—topic word pairs; identifying one or more entity words of the candidate document by a processor, wherein the one or more entity words are words indicating focused entities of the candidate document; identifying, based on each identified entity word, one or more topic words related to based entity words in the candidate document where the identified entity word is located; determining, by the processor, whether to add the candidate document into a filtered document set using the entity words and topic words in the given entity word—topic word pairs and the identified entity words and topic words in the candidate document; and adding the candidate document into the filtered document set in response to determining that the candidate document should be added into the filtered document set, wherein: each of the entity word—topic word pairs comprise an entity word and a topic word; all entity words in the entity word—topic word pair form an entity word set; and all topic words in the entity word—topic word pair where each entity word is located form a topic word set corresponding to the entity word.
地址 Armonk NY US