摘要 |
<P>PROBLEM TO BE SOLVED: To provide a new word collection system for automatically collecting new words from documents containing informal expressions and automatically categorizing the new words. <P>SOLUTION: A method for collecting new words includes: (a) a process for subjecting an input document to morphological analysis; (b) a process for setting undefined words contained in the result of the morphological analysis as candidates for new words; (c) a process for creating a substitutional document in which the candidates for new words are replaced by predetermined general words; (d) a process for subjecting the substitutional document to morphological analysis; (e) a process for setting, as candidates for new words, words resulting from combination of the above candidates for new words and morphemes adjacent thereto until there is no undefined word in the result of the morphological analysis in the process (d), and repeating the process (c) and (d); (f) a process for parsing the substitutional document; (g) a process for setting, as candidates for new words, words resulting from combination of the above candidates for new words and morphemes adjacent thereto until the result of parsing in the process (f) is determined to be valid, and repeating the process (e) and (f); and (h) a process for setting as new words the candidates for new words when the result of the parsing is determined to be valid. <P>COPYRIGHT: (C)2008,JPO&INPIT |