摘要 |
PROBLEM TO BE SOLVED: To provide a similar document retrieval device capable of solving the problem of the reduction in precision by unnecessary words by improving the precision of similarity calculation. SOLUTION: The category of a retrieval key document is designated. The words of the retrieval key document are extracted by a word extraction means. Category information showing unnecessary words in word unit is stored in a buffer, a word that is an unnecessary word is deleted from the words extracted by the word extraction means, and only words that are not unnecessary words are stored in the buffer. Based on the words except the words determined as unnecessary words of the retrieval key document and a document to be retrieved, the similarity between the both is calculated. COPYRIGHT: (C)2004,JPO
|