发明名称 A method and apparatus for retrieving relevant documents from a corpus of documents
摘要 <p>A method and apparatus accesses relevant documents based on a query (230). A thesaurus of word vectors (242) is formed for the words in the corpus of documents (240). The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors (246), which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector (232) is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking (252) of the documents within the factor cluster. <IMAGE></p>
申请公布号 EP0687987(A1) 申请公布日期 1995.12.20
申请号 EP19950304116 申请日期 1995.06.14
申请人 XEROX CORPORATION 发明人 SCHUETZE, HINRICH
分类号 G06F17/27;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址