A method for identifying clusters of similar documents from among a set of documents is described. A particular document is selected from among available documents of the set of documents, and a probe is generated based on the particular document. The probe comprises one or more features. Documents are found that satisfy a similarity condition using the probe from among the available documents. Some or all of the documents that satisfy the similarity condition are associated with a particular cluster of documents. The process can be repeated to generate further clusters. The method can be implemented with a computer, and associated programming instructions can be contained within a compute readable carrier.
申请公布号
WO2007059232(A3)
申请公布日期
2009.04.30
申请号
WO2006US44385
申请日期
2006.11.15
申请人
JUSTSYSTEMS EVANS RESEARCH, INC.;EVANS, DAVID, A.;SHEFTEL, VICTOR, M.;BENNETT, JEFFREY, K.
发明人
EVANS, DAVID, A.;SHEFTEL, VICTOR, M.;BENNETT, JEFFREY, K.