发明名称 Techniques for transliterating input text from a first character set to a second character set
摘要 Computer implemented techniques for performing transliteration of input text in a first character set to a second character set are disclosed. The techniques include receiving input text and determining a set of possible transliterations of the input text based on a plurality of mapping standards. Each mapping standard defines a mapping of characters in the first character set to characters in the second character set. The techniques further include determining a set of candidate words in the target language based on the possible transliterations and a text corpus. The techniques also include determining a likelihood score for each one of the candidate words based on a language model in the target language previously received words. The techniques also include providing one or more candidate words based on the likelihood scores and receiving a user selection indicating one of the candidate words.
申请公布号 US9613029(B2) 申请公布日期 2017.04.04
申请号 US201214381395 申请日期 2012.02.28
申请人 GOOGLE INC. 发明人 Yang Fan;Buryak Kirill;Yuan Feng;Liao Baohua
分类号 G06F17/28;G06F17/21 主分类号 G06F17/28
代理机构 Remarck Law Group PLC 代理人 Remarck Law Group PLC
主权项 1. A computer-implemented method comprising: receiving, at a computing device having one or more processors and from a user via a keyboard, input text in a first character set via an input method editor; determining, at the computing device, a set of possible transliterations of the input text based on a plurality of mapping standards, each possible transliteration of the set of possible transliterations corresponding to a transliteration of the input text into a second character set corresponding to a target language, each mapping standard of the plurality of mapping standards defining a mapping of each and every character in the second character set to one or more characters in the first character set, and each mapping standard having an associated transliteration probability stored for use with the input method editor, each transliteration probability being indicative of a likelihood that its corresponding mapping standard is appropriate for transliterating text from the user in the first character set to the second character set; determining a transliteration score for each of the possible transliterations based on the transliteration probabilities, the transliteration score being indicative of a likelihood that its corresponding possible transliteration is an accurate transliteration of the input text; determining, at the computing device, a set of candidate words in the target language based on the set of possible transliterations and a text corpus of the target language, wherein the set of candidate words includes words in the text corpus that match one of the set of possible transliterations, that are similar to one of the set of possible transliterations, and sound similar to one of the set of possible transliterations; determining, at the computing device, a likelihood score for each one of the set of candidate words based on a language model in the target language and one or more previous words received, each likelihood score being indicative of a probability that a corresponding candidate word corresponds to the input text; providing, from the computing device, one or more candidate words of the set of candidate words based on the likelihood scores and the transliteration scores; receiving a user selection indicating one of the candidate words; monitoring, at the computing device, tendencies of the user by determining a particular mapping standard of the plurality of mapping standards on which the selected candidate word was based; and adjusting, at the computing device, the transliteration probabilities stored for use with the input method editor based on the tendencies of the user as determined from the particular mapping standard.
地址 Mountain View CA US