发明名称 Document processing method and system
摘要 A method and system for expanding a document set as a search data source in the field of business related search. The present invention provides a method of expanding a seed document in a seed document set. The method includes identifying one or more entity words of the seed document; identifying one or more topic words identifying one or more topic words related to a based entity word in the seed document where the entity word is located; forming an entity word-topic word pair from each identified topic word and the entity word on the basis of which each topic word is identified; and obtaining one or more expanded documents by taking the entity word and topic word in each entity word-topic word pair as key words for web searching at the same time. A system for executing the above method is also provided.
申请公布号 US9043356(B2) 申请公布日期 2015.05.26
申请号 US201213608309 申请日期 2012.09.10
申请人 International Business Machines Corporation 发明人 Bao Sheng Hua;Cui Jie;Su Hui;Su Zhong;Zhang Li
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A method for expanding a seed document in a seed document set, wherein the seed document set comprises at least one seed document, the method comprising: identifying one or more entity words of the seed document in memory by a processor, wherein the one or more identified entity words are words indicating focused entities of the seed document; identifying by the processor, based on each of the one or more identified entity words of the seed document, one or more topic words related to each of the one or more identified entity words, the one or more identified topic words located in the seed document; forming, by the processor, an entity word-topic word pair from each of the one or more identified topic words and each of the one or more identified entity words upon which each of the one or more identified topic words is identified; and obtaining one or more expanded documents by the processor by taking the entity word and topic word in each entity word-topic word pair as key words for web searching at the same time, wherein the expanded documents comprise not only the entity word in the each entity word-topic word pair but also the topic word in the each entity word-topic word pair.
地址 Armonk NY US
您可能感兴趣的专利