发明名称
摘要 PURPOSE: To form a thesaurus for processing a natural language at high speed by sorting words by repeating division into clusters while using the cooccurrence frequency vectors of the words of sorting objects corresponding to an information quantity reference. CONSTITUTION: A statistical processing part 1 extracts words from an inputted document, totalizes (sums up) the cooccurrence frequency between the extracted word and the specified context of that word and prepares the cooccurrence frequency vector of the word. On the other hand, an automatic word sorting part 2 sorts the words while using the coccurrence frequency vector prepared by the statistic processing part 1 and outputs the thesaurus for sorting those words. When sorting the words with the automatic word sorting part 2 in this case, first of all, the word group of the sorting object is divided into two clusters, the relation (full description length) of two clusters at such a time is found, the the words of two clusters are exchanged so that this relation can be minimized corresponding to the prescribed information quantity reference. Then, clustering is performed again to two provided clusters and its division is performed until they can not be divided any more.
申请公布号 JP3304670(B2) 申请公布日期 2002.07.22
申请号 JP19950065716 申请日期 1995.03.24
申请人 发明人
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址