发明名称 System and method for identifying query-relevant keywords in documents with latent semantic analysis
摘要 A system and method for identifying query-related keywords in documents found in a search using latent semantic analysis. The documents are represented as a document term matrix <U STYLE="SINGLE">M containing one or more document term-weight vectors d, which may be term-frequency (tf) vectors or term-frequency inverse-document-frequency (tf-idf) vectors. This matrix is subjected to a truncated singular value decomposition. The resulting transform matrix <U STYLE="SINGLE">U can be used to project a query term-weight vector q into the reduced N-dimensional space, followed by its expansion back into the full vector space using the inverse of <U STYLE="SINGLE">U. To perform a search, the similarity of q<SUB>expanded </SUB>is measured relative to each candidate document vector in this space. Exemplary similarity functions are dot product and cosine similarity. Keywords are selected with the highest values in q<SUB>expanded </SUB>that are also comprised in at least one document. Matching keywords from the query may be highlighted in the search results.
申请公布号 US7440947(B2) 申请公布日期 2008.10.21
申请号 US20040987377 申请日期 2004.11.12
申请人 FUJI XEROX CO., LTD. 发明人 ADCOCK JOHN E.;COOPER MATTHEW;GIRGENSOHN ANDREAS;WILCOX LYNN D.
分类号 G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址