发明名称 Method and system for optimally searching a document database using a representative semantic space
摘要 A term-by-document matrix is compiled from a corpus of documents representative of a particular subject matter that represents the frequency of occurrence of each term per document. A weighted term dictionary is created using a global weighting algorithm and then applied to the term-by-document matrix forming a weighted term-by-document matrix. A term vector matrix and a singular value concept matrix are computed by singular value decomposition of the weighted term-document index. The k largest singular concept values are kept and all others are set to zero thereby reducing to the concept dimensions in the term vector matrix and a singular value concept matrix. The reduced term vector matrix, reduced singular value concept matrix and weighted term-document dictionary can be used to project pseudo-document vectors representing documents not appearing in the original document corpus in a representative semantic space. The similarities of those documents can be ascertained from the position of their respective pseudo-document vectors in the representative semantic space.
申请公布号 US7483892(B1) 申请公布日期 2009.01.27
申请号 US20050041799 申请日期 2005.01.24
申请人 KROLL ONTRACK, INC. 发明人 SOMMER MATTHEW S.;THOMPSON KEVIN B.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址