摘要 |
PROBLEM TO BE SOLVED: To efficiently perform clustering and document extraction by computing document similarity used as an absolute value, with high accuracy without depending on a document size. SOLUTION: This document similarity computing device is provided with an input part 11 for inputting a document set, and a normalization part 14 for computing similarity used as the relative value between the documents in the inputted document set, respectively on a plurality of combinations of documents by a tf-idf method using a document vector and the importance of words included in the documents, and converting each similarity into an absolute value by normalization. COPYRIGHT: (C)2003,JPO
|