摘要 |
A method and apparatus for almost-constant-time re-clustering of corpus subsets with customizable time/precision tradeoff, is usable in a basic browsing method, such as Scatter/Gather, to successfully partition a large document collection into clusters of related documents. The user is first presented with a clustering of the entire corpus into metadocuments from which the worst metadocument is selected and replaced with its "children". Children containing no documents of interest are pruned and the remaining metadocuments are further expanded until a predetermined number of children metadocuments are obtain. The resulting metadocuments are then reclustered. The process is repeated until the user obtains the desired degree of specificity.
|