发明名称 SYSTEM AND METHOD FOR IDENTIFYING THE PRINCIPAL DOCUMENTS IN A DOCUMENT SET
摘要 <p>A method of identifying a principal document is provided. An exemplary method includes obtaining a document set comprising a plurality of documents and grouping the plurality of documents into a plurality of clusters based, at least in part, on a textual similarity between each of the plurality of documents. The method also includes obtaining one or more descriptive terms corresponding to the plurality of documents, wherein the descriptive terms are terms within the plurality of documents that have been identified as being useful for discriminating between the clusters. The method also includes, for each cluster, identifying a subset of descriptive terms based, at least in part, on a prevalence of the descriptive terms within the documents of the cluster and identifying the principal documents in the cluster based, at least in part, on a prevalence of the subset of descriptive terms within each of the documents in the cluster.</p>
申请公布号 WO2011099982(A1) 申请公布日期 2011.08.18
申请号 WO2010US24200 申请日期 2010.02.13
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, LP;DEOLALIKAR, VINAY;LAFFITTE, HERNAN 发明人 DEOLALIKAR, VINAY;LAFFITTE, HERNAN
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址