发明名称 UNSUPERVISED DOCUMENT CLUSTERING USING LATENT SEMANTIC DENSITY ANALYSIS
摘要 According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.
申请公布号 US2012011124(A1) 申请公布日期 2012.01.12
申请号 US20100831909 申请日期 2010.07.07
申请人 BELLEGARDA JEROME R.;APPLE INC. 发明人 BELLEGARDA JEROME R.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址