发明名称 |
Method and system for optimally searching a document database using a representative semantic space |
摘要 |
A term-by-document matrix is compiled from a corpus of documents representative of a particular subject matter that represents the frequency of occurrence of each term per document. A weighted term dictionary is created using a global weighting algorithm and then applied to the term-by-document matrix forming a weighted term-by-document matrix. A term vector matrix and a singular value concept matrix are computed by singular value decomposition of the weighted term-document index. The k largest singular concept values are kept and all others are set to zero thereby reducing to the concept dimensions in the term vector matrix and a singular value concept matrix. The reduced term vector matrix, reduced singular value concept matrix and weighted term-document dictionary can be used to project pseudo-document vectors representing documents not appearing in the original document corpus in a representative semantic space. The similarities of those documents can be ascertained from the position of their respective pseudo-document vectors in the representative semantic space.
|
申请公布号 |
US6847966(B1) |
申请公布日期 |
2005.01.25 |
申请号 |
US20020131888 |
申请日期 |
2002.04.24 |
申请人 |
ENGENIUM CORPORATION |
发明人 |
SOMMER MATTHEW S.;THOMPSON KEVIN B. |
分类号 |
G06F17/30;(IPC1-7):G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|