发明名称 Document clustering based on cohesive terms
摘要 A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
申请公布号 US7930282(B2) 申请公布日期 2011.04.19
申请号 US20080058295 申请日期 2008.03.28
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 SPANGLER WILLIAM S.
分类号 G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址