摘要 |
<p>A method of identifying a principal document is provided. An exemplary method includes obtaining a document set comprising a plurality of documents and grouping the plurality of documents into a plurality of clusters based, at least in part, on a textual similarity between each of the plurality of documents. The method also includes obtaining one or more descriptive terms corresponding to the plurality of documents, wherein the descriptive terms are terms within the plurality of documents that have been identified as being useful for discriminating between the clusters. The method also includes, for each cluster, identifying a subset of descriptive terms based, at least in part, on a prevalence of the descriptive terms within the documents of the cluster and identifying the principal documents in the cluster based, at least in part, on a prevalence of the subset of descriptive terms within each of the documents in the cluster.</p> |
申请人 |
HEWLETT-PACKARD DEVELOPMENT COMPANY, LP;DEOLALIKAR, VINAY;LAFFITTE, HERNAN |
发明人 |
DEOLALIKAR, VINAY;LAFFITTE, HERNAN |