摘要 |
<P>PROBLEM TO BE SOLVED: To provide a document similarity calculation device capable of lightening the load of processing. <P>SOLUTION: A document similarity calculation device 100 is configured to calculate similarity indicative of how much a plurality of documents are similar to each other. The document similarity calculation device 100 includes a related word group storage part 101 which stores a related word group consisting of mutually related words, a word document frequency matrix generation part 102 which generates a word document frequency matrix as a matrix including as an element a frequency at which a word appears in a document for each combination of the document and word, a word document frequency matrix conversion part 103 which converts the word document frequency matrix on the basis of the stored related word group so as to decrease the number of dimensions of the generated word document frequency matrix, and a similarity calculation part 104 which calculates the similarity on the basis of the word document frequency matrix after the conversion. <P>COPYRIGHT: (C)2013,JPO&INPIT |