发明名称 System and method for context-based document retrieval
摘要 A system and method for document retrieval is disclosed. The invention addresses a major problem in text-based document retrieval: rapidly finding a small subset of documents in a large document collection (e.g. Web pages on the Internet) that are relevant to a limited set of query terms supplied by the user. The invention is based on utilizing information contained in the document collection about the statistics of word relationships ("context") to facilitate the specification of search queries and document comparison. The method consists of first compiling word relationships into a context database that captures the statistics of word proximity and occurrence throughout the document collection. At retrieval time, a search matrix is computed from a set of user-supplied keywords and the context database. For each document in the collection, a similar matrix is computed using the contents of the document and the context database. Document relevance is determined by comparing the similarity of the search and document matrices. The disclosed system therefore retrieves documents with contextual similarity rather than word frequency similarity, simplifying search specification while allowing greater search precision.</PTEXT>
申请公布号 US6633868(B1) 申请公布日期 2003.10.14
申请号 US20000627617 申请日期 2000.07.28
申请人 MIN SHERMANN LOYALL 发明人 MIN SHERMANN LOYALL;TANNO CONSTANTIN LORENZO;MAINEN ZACHARY FRANK;SOFTKY WILLIAM RUSSELL
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址