发明名称 TEXT MINING APPARATUS, METHOD, PROGRAM, AND RECORDING MEDIUM
摘要 <P>PROBLEM TO BE SOLVED: To appropriately remove influence of a fixed form section from word ranking in text mining. Ž<P>SOLUTION: A plurality of analysis object documents each are divided into a plurality of words. An appearance frequency of each of a plurality of divided words in an analysis object document (hereinafter, a word appearance frequency) is obtained. A fixed form section of a certain analysis object document is made a fixed form portion unrelated to a theme of the analysis object document. An average word appearance frequency of a certain word in the fixed form section is made an estimated average frequency of appearance of the word in the fixed form section of a plurality of analysis object documents. A fixed form section average word appearance frequency of the word is subtracted from the word appearance frequency which is obtained for the divided word, and a word appearance frequency after removal of influence of the fixed form section is obtained on the word. The document frequency for each of words whose word appearance frequency after removal of influence in the fixed form section is above or higher than the frequency defined beforehand is obtained, assuming that a document frequency of a certain word is the number of a plurality of analysis object documents which contain the word. Ž<P>COPYRIGHT: (C)2010,JPO&INPIT Ž
申请公布号 JP2010039671(A) 申请公布日期 2010.02.18
申请号 JP20080200574 申请日期 2008.08.04
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 NOMOTO NARIHISA;NODA YOSHIAKI;AMAKASU TETSUO
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址