摘要 |
A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.
|