摘要 |
PROBLEM TO BE SOLVED: To cluster a corpus subset in an almost fixed time by repeating selec tion and extension steps until the number of the next documents becomes equal to a prescribed maximum number and clustering the next documents. SOLUTION: A root node is immediately replaced by k children. K nodes of a focus set T are checked and the 'worst' node is picked up. Whether the set T has the number of nodes which is equal to the maximum number M of nodes that have to be collected or a node number that is larger than it is decided. When the node number of the set T is at last equal to the M, the set T is clustered and a set of clusters P is obtained. Next, each node of the set of the clusters P is replaced by a document Is (n) of concern in order to eliminate a document that does not exist in a set S of clusters. M detected nodes are clustered by using a linear time clustering method.
|