发明名称 TECHNICAL TERM EXTRACTION APPARATUS, TECHNICAL TERM EXTRACTION METHOD AND TECHNICAL TERM EXTRACTION PROGRAM
摘要 PROBLEM TO BE SOLVED: To highly accurately extract a technical term even from a document set in which a plurality of categories are imparted to one document. SOLUTION: A category-based frequency calculation means 12 obtains the category-based appearing frequency of a candidate word string which is a technical term candidate according to the condition that only one of the plurality of categories is used even when the plurality of categories are imparted to one document. An entropy calculation means 13 calculates the entropy of each candidate word string on the basis of the category-based appearing frequency of each candidate word string calculated by the category-based frequency calculation means 12, and decides whether or not each candidate word string is the technical term on the basis of the calculated entropy. Alternatively, a chi-squared value, the minimum value of the number of the categories and the maximum value of the appearing frequency can be obtained on the basis of the category-based appearing frequency and whether or not it is the technical term can be decided on the basis of the obtained values. COPYRIGHT: (C)2007,JPO&INPIT
申请公布号 JP2007079948(A) 申请公布日期 2007.03.29
申请号 JP20050267079 申请日期 2005.09.14
申请人 NEC CORP 发明人 TATEISHI KENJI;KUSUI MASARU
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址