发明名称 SIMILAR DOCUMENT RETRIEVAL SYSTEM, SIMILAR DOCUMENT RETRIEVAL METHOD AND PROGRAM
摘要 PROBLEM TO BE SOLVED: To provide a technology for accurately detecting a document which is similar to a certain document from a document database regardless of the existence of any word on which characteristic contents shown by a document are not accurately reflected. SOLUTION: At the time of retrieving a similar document, an IDF value showing the degree of appearance of words appearing in a plurality of documents included in a database for learning is compared with a predetermined threshold, so that frequently appearing words can be detected. As to the frequently appearing words, any TFIDF value is not calculated, and the featured vectors of a reference document with specified TFIDF values for respective words appearing in the reference document as components are calculated. Furthermore, similarity between the reference document and each of documents included in a document DB162 is calculated by using the featured vectors of the reference document and the featured vectors of each of the documents included in the document DB162. Then, the document similar to the reference document is detected and outputted based on the similarity. COPYRIGHT: (C)2006,JPO&NCIPI
申请公布号 JP2006201926(A) 申请公布日期 2006.08.03
申请号 JP20050011440 申请日期 2005.01.19
申请人 KONICA MINOLTA HOLDINGS INC 发明人 YASUNAGA SUSUMU
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址