摘要 |
A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word. |