摘要 |
A method of constructing a language model for a phrase-based search in a speech recognition system and an apparatus for constructing and/or searching through the language model. The method includes the step of separating a plurality of phrases into a plurality of words in a prefix word, body word, and suffix word structure. Each of the phrases has a body word and optionally a prefix word and a suffix word. The words are grouped into a plurality of prefix word classes, a plurality of body word classes, and a plurality of suffix word classes in accordance with a set of predetermined linguistic rules. Each of the respective prefix, body, and suffix word classes includes a number of prefix words of same category, a number of body words of same category, and a number of suffix words of same category, respectively. The prefix, body, and suffix word classes are then interconnected together according to the predetermined linguistic rules. A method of organizing a phrase search based on the above-described prefix/body/suffix language model is also described. The words in each of the prefix, body, and suffix classes are organized into a lexical tree structure. A phrase start lexical tree structure is then created for the words of all the prefix classes and the body classes having a word which can start one of the plurality of phrases while still maintaining connections of these prefix and body classes within the language model.
|