发明名称 Multi-stage query processing system and method for use with tokenspace repository
摘要 A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.
申请公布号 US9146967(B2) 申请公布日期 2015.09.29
申请号 US201313851036 申请日期 2013.03.26
申请人 Google Inc. 发明人 Dean Jeffrey A.;Haahr Paul G.;Sercinoglu Olcan;Singhal Amitabh K.
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项 1. A method of processing a query in a multi-stage query processing system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the method comprising: performing a first stage processing of a query, including: retrieving a first set of document identifiers from an index in response to one or more query terms;generating a first set of relevancy scores for a first set of compressed documents corresponding to at least a subset of the first set of document identifiers based on one or more of: presence of query terms, term frequency, and document popularity; andstoring the first set of relevancy scores in the memory; performing a second stage processing of the query, including: generating a second set of relevancy scores for the documents in the first set of compressed documents based on one or more of: a list of token positions for one or more query terms in the query, distances between query terms in the documents, attributes of tokens in the documents, and text that appears around a query term used in a document of the first set of documents; andstoring the second set of relevancy scores in the memory; reading the first and second set of relevancy scores from the memory, and generating an ordered list of documents for further processing based on the first and second set of relevancy scores; automatically generating additional query terms from the documents in the ordered list of documents; formulating a new query using the additional query terms; processing the new query to retrieve a second set of document identifiers from the index and to generate a third set of relevancy scores based at least in part on the additional query terms; and using the third set of relevancy scores to select a set of top documents for presentation to the user.
地址 Mountain View CA US