摘要 |
PURPOSE:To use semantic differences to automatically classify a document by automatically extracting feature vectors from the document and classifying the document based on these feature vectors. CONSTITUTION:A storage part 101 where document data is stored, a document analysis part 102 which analyzes document data, a word vector generating part 103 which uses concurrent relations between words in the document to automatically generate a feature vector expressing the features of each word, a word vector storage part 104 where feature vectors are stored, a document vector generating part 105 which generates feature vectors of the document from feature vectors of words included in the document, a document vector storage part 106 where feature vectors of the document are stored, a classifying part 107 which uses the similarity between feature vectors of the document to classify the document, a result storage part 108 where the classification result is stored, and a feature vector generating dictionary 109 where words to be used for feature vector generation are registered are provided. |