发明名称 Adaptive generation of out-of-dictionary personalized long words
摘要 A system is provided, including a display unit, a memory unit, and a processor. The processor is configured to calculate a mutual information value between a first chunk and a second chunk, and to add a new word to a language unit when a condition involving the mutual information value is satisfied. The new word is a combination of the first chunk and the second chunk. The processor is also configured to add the new word into an n-gram store. The n-gram store includes a plurality of n-grams and associated frequency or count information. The processor is also configured to alter the frequency or count information based on the new word.
申请公布号 US9411800(B2) 申请公布日期 2016.08.09
申请号 US200812163082 申请日期 2008.06.27
申请人 MICROSOFT TECHNOLOGY LICENSING, LLC 发明人 Morin Frederic;Yu Wei;Eisenhart F. James;Zhang Qi
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 代理人 Churna Timothy;Yee Judy;Minhas Micky
主权项 1. A method implemented by a device that executes a word building application, the method comprising: receiving, by the device, letters to initiate segmentation into a first letter set and a second letter set, each of the first and second letter sets comprising one or more of the letters; determining, by the device, a statistical relationship between the first letter set and the second letter set; determining, by the device, whether the statistical relationship satisfies a condition; responsive to satisfying the condition, adding, by the device, a word composed of the first letter set and the second letter set into a data store associated with the device containing one or more words and one or more bigrams, each of the words having associated count information and each of the bigrams composed of a leading data word and one or more trailing data words, each bigram configured to have associated count information; identifying, by the device, a first bigram from a set of bigrams in the data store, each bigram of the set including the first letter set as a trailing data word; transforming, by the device, a leading data word of the first bigram and the word into a new bigram to add to the data store associated with the device, the new bigram composed of: the leading data word of the first bigram as a leading data word of the new bigram and the word as a trailing data word of the new bigram, computing, by the device, updated count information for the new bigram using a proportional adjustment that describes a relationship between the word following the leading data word of the first bigram relative to the first letter set following the leading data word of the first bigram; and analyzing, by the device, received user generated content using the data store associated with the device to predict text associated with the user generated content.
地址 Redmond WA US