发明名称 Document clustering that applies a locality sensitive hashing function to a feature vector to obtain a limited set of candidate clusters
摘要 Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document using a locality sensitive hashing function. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. Documents may then be clustered into one or more of the candidate clusters using distance measures from the feature vector of the document to the cluster centroids.
申请公布号 US7797265(B2) 申请公布日期 2010.09.14
申请号 US20080072179 申请日期 2008.02.25
申请人 SIEMENS CORPORATION 发明人 BRINKER KLAUS;MOERCHEN FABIAN;GLOMANN BERNHARD;NEUBAUER CLAUS
分类号 G06N5/00 主分类号 G06N5/00
代理机构 代理人
主权项
地址