发明名称 INFORMATION RETRIEVAL AND TEXT MINING USING DISTRIBUTED LATENT SEMANTIC INDEXING
摘要 The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determining which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
申请公布号 CA2523128(C) 申请公布日期 2011.09.27
申请号 CA20042523128 申请日期 2004.04.23
申请人 TELCORDIA TECHNOLOGIES, INC. 发明人 BEHRENS, CLIFFORD A.;BASSU, DEVASIS
分类号 G06F17/30;G06F7/00;G11B 主分类号 G06F17/30
代理机构 代理人
主权项
地址