发明名称 HYBRID TENSOR-BASED CLUSTER ANALYSIS
摘要 What is disclosed is a novel system and method for analyzing multi-dimensional cluster data sets to identify clusters of related documents in an electronic document storage system. Digital documents, for which multi-dimensional probabilistic relationships are to be determined, are received and then parsed to identify multi-dimensional count data with at least three dimensions. Multi-dimensional tensors representing the count data and estimated cluster membership probabilities are created. The tensors are then iteratively processed using a first and a complementary second tensor factorization model to refine the cluster definition matrices until a convergence criteria has been satisfied. Likely cluster memberships for the count data are determined based upon the refinements made to the cluster definition matrices by the alternating tensor factorization models. The present method advantageously extends to the field of tensor analysis a combination of Non-negative Matrix Factorization and Probabilistic Latent Semantic Analysis to decompose non-negative data.
申请公布号 US2010312797(A1) 申请公布日期 2010.12.09
申请号 US20090479392 申请日期 2009.06.05
申请人 XEROX CORPORATION 发明人 PENG WEI
分类号 G06F7/06;G06F15/18;G06F17/30;G06N5/02 主分类号 G06F7/06
代理机构 代理人
主权项
地址