发明名称 METHOD AND DEVICE FOR CLUSTERING OPTIONAL CORPUS SUBSET IN ALMOST FIXED TIME
摘要 PROBLEM TO BE SOLVED: To cluster a corpus subset in an almost fixed time by repeating selec tion and extension steps until the number of the next documents becomes equal to a prescribed maximum number and clustering the next documents. SOLUTION: A root node is immediately replaced by k children. K nodes of a focus set T are checked and the 'worst' node is picked up. Whether the set T has the number of nodes which is equal to the maximum number M of nodes that have to be collected or a node number that is larger than it is decided. When the node number of the set T is at last equal to the M, the set T is clustered and a set of clusters P is obtained. Next, each node of the set of the clusters P is replaced by a document Is (n) of concern in order to eliminate a document that does not exist in a set S of clusters. M detected nodes are clustered by using a linear time clustering method.
申请公布号 JPH11316768(A) 申请公布日期 1999.11.16
申请号 JP19990017644 申请日期 1999.01.26
申请人 XEROX CORP 发明人 CRAIG D SILVERSTEIN
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址