主权项 |
1. A method implemented by a device that executes a word building application, the method comprising:
receiving, by the device, letters to initiate segmentation into a first letter set and a second letter set, each of the first and second letter sets comprising one or more of the letters; determining, by the device, a statistical relationship between the first letter set and the second letter set; determining, by the device, whether the statistical relationship satisfies a condition; responsive to satisfying the condition, adding, by the device, a word composed of the first letter set and the second letter set into a data store associated with the device containing one or more words and one or more bigrams, each of the words having associated count information and each of the bigrams composed of a leading data word and one or more trailing data words, each bigram configured to have associated count information; identifying, by the device, a first bigram from a set of bigrams in the data store, each bigram of the set including the first letter set as a trailing data word; transforming, by the device, a leading data word of the first bigram and the word into a new bigram to add to the data store associated with the device, the new bigram composed of: the leading data word of the first bigram as a leading data word of the new bigram and the word as a trailing data word of the new bigram, computing, by the device, updated count information for the new bigram using a proportional adjustment that describes a relationship between the word following the leading data word of the first bigram relative to the first letter set following the leading data word of the first bigram; and analyzing, by the device, received user generated content using the data store associated with the device to predict text associated with the user generated content. |