发明名称 NEW WORD COLLECTION DEVICE, METHOD AND PROGRAM
摘要 <P>PROBLEM TO BE SOLVED: To provide a new word collection system for automatically collecting new words from documents containing informal expressions and automatically categorizing the new words. <P>SOLUTION: A method for collecting new words includes: (a) a process for subjecting an input document to morphological analysis; (b) a process for setting undefined words contained in the result of the morphological analysis as candidates for new words; (c) a process for creating a substitutional document in which the candidates for new words are replaced by predetermined general words; (d) a process for subjecting the substitutional document to morphological analysis; (e) a process for setting, as candidates for new words, words resulting from combination of the above candidates for new words and morphemes adjacent thereto until there is no undefined word in the result of the morphological analysis in the process (d), and repeating the process (c) and (d); (f) a process for parsing the substitutional document; (g) a process for setting, as candidates for new words, words resulting from combination of the above candidates for new words and morphemes adjacent thereto until the result of parsing in the process (f) is determined to be valid, and repeating the process (e) and (f); and (h) a process for setting as new words the candidates for new words when the result of the parsing is determined to be valid. <P>COPYRIGHT: (C)2008,JPO&INPIT
申请公布号 JP2008176392(A) 申请公布日期 2008.07.31
申请号 JP20070007056 申请日期 2007.01.16
申请人 NEC CORP 发明人 SASAKI TAKUO
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址