发明名称 |
Multiple recognizer speech recognition |
摘要 |
The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription. |
申请公布号 |
US9058805(B2) |
申请公布日期 |
2015.06.16 |
申请号 |
US201313892590 |
申请日期 |
2013.05.13 |
申请人 |
Google Inc. |
发明人 |
Aleksic Petar;Mengibar Pedro J.;Biadsy Fadi |
分类号 |
G10L15/26;G06F17/27;G10L15/18;G06K9/62;G10L15/01;G10L15/32;G10L15/30;G10L15/197 |
主分类号 |
G10L15/26 |
代理机构 |
Fish & Richardson P.C. |
代理人 |
Fish & Richardson P.C. |
主权项 |
1. A computer-implemented method performed by a data processing apparatus, the method comprising:
receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query. |
地址 |
Mountain View CA US |