发明名称 |
Determining word boundary likelihoods in potentially incomplete text |
摘要 |
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key. |
申请公布号 |
US8930399(B1) |
申请公布日期 |
2015.01.06 |
申请号 |
US201313739591 |
申请日期 |
2013.01.11 |
申请人 |
Google Inc. |
发明人 |
Das Abhinandan S.;Fung Harry S. |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Fish & Richardson P.C. |
代理人 |
Fish & Richardson P.C. |
主权项 |
1. A computer implemented method, comprising:
accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query:
selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence:
determining a query sequence key for the selected query sequence;determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; andassociating, in a data storage device, the word boundary likelihood with the query sequence key. |
地址 |
Mountain View CA US |