主权项 |
1. A method for configuring a search engine to provide suggested search queries in response to input search queries for searching a corpus of documents, wherein each document contains a plurality of tokens, the method comprising:
generating, by a computing device, tokens from the documents in the corpus; generating, by the computing device, for each of the tokens a plurality of residual strings, wherein each residual string for a token comprises a one-character or multi-character variation of the token; generating, by the computing device, for each token a direct producer list, the direct producer list for a token comprising the plurality of residual strings for the token, and an associated weight for each residual string based upon the number of characters variations between the token and the residual string; forming, by the computing device, for each residual string at least one indirect producer list by propagating to the residual string the direct producer lists of the tokens from which the residual string was generated; propagating, by the computing device, each token with a corresponding weight to other tokens having one or more common residual strings, wherein the corresponding weight is based on upon the weights of residual strings associated with both the token and the other tokens; storing, by the computing device, the tokens, the associated weights propagated to each of the tokens, and the indirect producer list for each residual string associated with each of the tokens, as a confusion set, wherein the residual strings and the tokens in the indirect producer list are the suggested search queries for the residual string. |