摘要 |
<p>IMPROVEMENTS IN CONTINUOUS SPEECH RECOGNITION An improved speech recognition method and apparatus for recognizing keywords in a continuous audio signal are disclosed. The keywords, generally either a word or a string of words, are each represented by an element template defined by a plurality of target patterns. Each target pattern is represented by a plurality of statistics describing the expected behavior of a group of spectra selected from plural short-term spectra generated by processing of the incoming audio. The incoming audio spectra are processed to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into multi-frame spectral patterns and are compared, using likelihood statistics, with the target patterns of the element templates. Each multi-frame pattern is forced to contribute to each of a plurality of pattern scores as represented by the element templates. The method and apparatus use speaker independent word models during the training stage to generate, automatically, improved target patterns. The apparatus and method further employ grammatical syntax during the training stage for identifying the boundaries of unknown keywords. During the recognition process, improved performance is achieved by use of alternate spellings for "silence" and memory requirements and the computational load is reduced using an augmented grammatical syntax. A concatenation technique is employed, using dynamic programming techniques, to determine the correct identity of the word string.</p> |