摘要 |
<P>PROBLEM TO BE SOLVED: To extract various kinds of words independent of contingency even with respect to low frequency words in a document set of a word extraction object. <P>SOLUTION: A partial character string statistic calculation part 330 reads data related to a partial character string from a work area 600, calculates a statistic, and stores it into the work area 600. A word candidate statistic calculation part 340 reads the statistic of the partial character string and a word candidate from the work area 600, reads a statistic of the partial character string calculated in advance from a document set different from a word extraction object document from an other document statistic DB 700, adds the statistics of the partial character string of both the document sets to calculate a statistic of the word candidate, and stores it into the work area 600. A word candidate selection part 350 reads statistic data of the word candidate from the work area 600, selects the word candidate from the respective word candidates on the basis of the statistic to decide the word, and stores data on the decided word into the work area 600. <P>COPYRIGHT: (C)2004,JPO&NCIPI |