摘要 |
PROBLEM TO BE SOLVED: To highly accurately extract a technical term even from a document set in which a plurality of categories are imparted to one document. SOLUTION: A category-based frequency calculation means 12 obtains the category-based appearing frequency of a candidate word string which is a technical term candidate according to the condition that only one of the plurality of categories is used even when the plurality of categories are imparted to one document. An entropy calculation means 13 calculates the entropy of each candidate word string on the basis of the category-based appearing frequency of each candidate word string calculated by the category-based frequency calculation means 12, and decides whether or not each candidate word string is the technical term on the basis of the calculated entropy. Alternatively, a chi-squared value, the minimum value of the number of the categories and the maximum value of the appearing frequency can be obtained on the basis of the category-based appearing frequency and whether or not it is the technical term can be decided on the basis of the obtained values. COPYRIGHT: (C)2007,JPO&INPIT
|