发明名称 SIMILAR DOCUMENT RETRIEVAL DEVICE AND METHOD AND RECORDING MEDIUM RECORDING SIMILAR DOCUMENT RETRIEVAL PROGRAM
摘要 PROBLEM TO BE SOLVED: To improve both inter-document similarity calculation accuracy and similar document retrieval accuracy via the optimization of a list of unnecessary words by deciding some of extracted words as unnecessary words, deleting the unnecessary words from a retrieval key document and a retrieval object document and calculating the similarity between both documents. SOLUTION: Some of words extracted by a word extraction means are decided as unnecessary words based on the occurrence frequency of each designated unnecessary word. Then the unnecessary words are deleted from a retrieval key document and a retrieval object document, and the similarity is calculated between both documents. An unnecessary word deletion part 28 of this similar document retrieval device deletes the words equivalent to the unnecessary words stored in an unnecessary word buffer 45 from a retrieval keyword information storing buffer 47 and a retrieval object word information storing buffer 42. A similarity calculation part 29 calculates the similarity between the retrieval key document and the retrieval object document based on the information which are stored in the buffer 47, the buffer 42 and a common word information storing buffer 48.
申请公布号 JPH11259515(A) 申请公布日期 1999.09.24
申请号 JP19980061726 申请日期 1998.03.12
申请人 TOSHIBA CORP;TOSHIBA COMPUT ENG CORP 发明人 TANOSAKI YASUO;NAKAMOTO YUKIO;NISHINA TAKUYA;KUBOTA NAOHIDE
分类号 G06F17/21;G06F17/27;G06F17/30 主分类号 G06F17/21
代理机构 代理人
主权项
地址