摘要 |
PROBLEM TO BE SOLVED: To provide a document classifying device that outputs only a partial document set that deserves the matter that a user actually evaluates contents as classified results. SOLUTION: This document classifying device has a document inputting part 101 for inputting a document set, a document analyzing part 102 for applying a morphological analysis to each inputted document and extracting words of each document together with parts of speech information, a document vector space generating part 103 for representing each document with a multidimensional vector space according to the extracted word information, a document classifying part 104 for generating a plurality of partial document sets by a statistical method for measuring similarity from the word information, classifying each document into respective partial document sets to make the each document belong to the respective partial document sets, a classified results validity deciding part 105 for calculating the validity evaluation value of each partial document set according to the word information of the each document made to belong to the respective partial document sets and allocating an identifier showing whether or not to satisfy a designated condition, and a classified results outputting part 106 for outputting only a partial documents to which the identifier showing the condition satisfaction is attached as classified results.
|