摘要 |
A speech recognition method for detecting and recognizing one or more keywords in a continuous audio signal is disclosed. Each keyword is represented by a keyword template representing a plurality of target patterns, and each target pattern comprises statistics of each of a plurality of spectra selected from plural short-term spectra generated according to a predetermined system for processing of the incoming audio. The spectra are processed to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into multi-frame spectral patterns and are compared by means of likelihood statistics with the target patterns of the keyword templates. A concatenation technique employing a loosely set detection threshold makes it very unlikely that a correct pattern will be rejected. |