发明名称 Computer-Implemented System And Method For Clustering Documents Based On Scored Concepts
摘要 A computer-implemented system and method for clustering documents based on scored concepts is provided. A set of documents is obtained and concepts are extracted from the documents. A score is calculated for each concept. The score is determined as a function of summation of a frequency of occurrence, concept weight, structural weight, and corpus weight. The documents in the set are clustered based on the scores. A vector is formed for each document based on the concepts in that document and the scores associated with the concepts. A similarity is determined between each document and each of the other documents based on the formed vectors. Those documents that are sufficiently distinct from the other documents are identified as seed documents for separate document clusters. Each of the remaining documents are grouped into one of the clusters most similar to that remaining document.
申请公布号 US2014122495(A1) 申请公布日期 2014.05.01
申请号 US201414148686 申请日期 2014.01.06
申请人 FTI TECHNOLOGY LLC 发明人 KAWAI KENJI;EVANS LYNNE MARIE
分类号 G06F17/30;G06F17/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址