发明名称 Extracting terms from document data including text segment
摘要 A computer system, method, and article of manufacture for extracting a term from electronic document data that includes a text segment. The system includes: a first extraction unit that uses a first text processing information to extract a noun word from the document data; a second extraction unit that uses a second text processing information to extract a term candidate in relation to the noun word or a corpus that includes text data described in the same language used in the document data; a weight assignment unit that uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each noun word and term candidate; a determination unit that determines the type to which the noun word and term candidate belong; and an output unit to output the noun word and term candidate.
申请公布号 US9043339(B2) 申请公布日期 2015.05.26
申请号 US201313899020 申请日期 2013.05.21
申请人 International Business Machines Corporation 发明人 Ikawa Yohei;Negishi Shiho;Takeuchi Hironori
分类号 G06F17/30;G06F17/28;G06F17/27 主分类号 G06F17/30
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP ;Zarick Gail
主权项 1. A computer-implemented system including a memory and a processor communicatively coupled to the memory for extracting terms from electronic document data that includes a text segment, the computer system comprising: a first extraction unit that uses a first text processing information to extract a noun word from the document data; a second extraction unit that uses a second text processing information to extract a term candidate in relation to the extracted noun word from the document data or from a corpus that includes text data described in the same language used in the document data; a weight assignment unit that, in order to determine which one of a plurality of noun word types the extracted noun word and the extracted term candidate each belong to, uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate; a determination unit that determines the type to which the extracted noun word and the extracted term candidate each belong, based on the assigned weight; and an output unit which follows the determination to output the extracted noun word and the extracted term candidate each in association with the determined type, wherein the weight assignment unit uses the third text processing information to select which type to assign the weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate by: obtaining a number of times a genitive case word modifies the extracted noun word and a number of times a genitive case word modifies the extracted term candidate, in the document data or in the corpus including the text data described in the same language used in the document data; and selecting the type to be assigned a weight according to whether or not the obtained number of times is in a predetermined threshold value range.
地址 Armonk NY US
您可能感兴趣的专利