发明名称 Method and apparatus for almost-constant-time clustering of arbitrary corpus subsets
摘要 A method and apparatus for almost-constant-time re-clustering of corpus subsets with customizable time/precision tradeoff, is usable in a basic browsing method, such as Scatter/Gather, to successfully partition a large document collection into clusters of related documents. The user is first presented with a clustering of the entire corpus into metadocuments from which the worst metadocument is selected and replaced with its "children". Children containing no documents of interest are pruned and the remaining metadocuments are further expanded until a predetermined number of children metadocuments are obtain. The resulting metadocuments are then reclustered. The process is repeated until the user obtains the desired degree of specificity.
申请公布号 US6038557(A) 申请公布日期 2000.03.14
申请号 US19980013668 申请日期 1998.01.26
申请人 XEROX CORPORATION 发明人 SILVERSTEIN, CRAIG D.
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址