摘要 |
PROBLEM TO BE SOLVED: To extremely reduce the deviation of the size of an extracted cluster at the time of extracting a document cluster. SOLUTION: A target document group to be inputted by a target document input part 1 is designated by a user. A word extraction part 2 performs morphemic analysis processing to the text data of an inputted document, and calculates a word appearance frequency and a document appearance frequency. An inter-document relevance calculation part 3 calculates relevancy by using the word vector of the document. A hierarchical cluster analysis part 4 is configured according to the technique of a general hierarchical cluster analysis to gather the cluster hierarchies of the document by using the relevancy. A cluster extraction part 5 evaluates and selects the cluster hierarchy by using a predetermined rule, and extracts the desired number of clusters from the selected cluster hierarchy. COPYRIGHT: (C)2005,JPO&NCIPI
|