摘要 |
A system and method for using a context-based dynamic speech recognition grammar generation system that is suitable for multimodal input when applied to context-based search scenarios. Dynamic context-based grammar is generated for a media stream during a post-processing period. The media stream is fed to an external automatic speech recognizer (ASR) for a specified number of frames. The ASR performs recognition of words that do not occur in common vocabulary that may be specific to those media frames. These words that are specific to the frames are sent back to the post processor, where they are fed to a dynamic grammar generator that generates speech grammars in some format, using the words that are fed to it. This grammar and other contextual information, form a new set of context data for those frames of media. The media, the grammar and other context data. is stored in a database. This is repeated for the entire stream of media, and a full speech recognition grammar can be constructed.
|