摘要 |
PROBLEM TO BE SOLVED: To adequately classify many documents which are linked complicatedly like hypertexts by generating an initial document cluster on the basis of link relation and document distances, taking a cluster analysis based upon the document distances, and classifying the documents. SOLUTION: A document storage part 11 stores electronized documents and a link relation storage part 12 stores the link relation among the documents stored in the document storage part 11. A distance calculating processing part 13 calculates the document distances from the appearance frequencies of words included in the respective documents stored in the document storage part 11 and then a document classifying processing part 14 generates the initial document cluster on the basis of the stored link relation and the obtained document distances and takes the cluster analysis based upon the document distances to classify the documents stored in the document storage part 11. Then an output processing part 15 outputs the classification result of the document classifying processing part 14. |