发明名称 Enhanced endpoint detection for speech recognition
摘要 Determining the end of an utterance for purposes of automatic speech recognition (ASR) may be improved with a system that provides early results and/or incorporates semantic tagging. Early ASR results of an incoming utterance may be prepared based at least in part on an estimated endpoint and processed by a natural language understanding (NLU) process while final results, based at least in part on a final endpoint, are determined. If the early results match the final results, the early NLU results are already prepared for early execution. The endpoint may also be determined based at least in part on the content of the utterance, as represented by semantic tagging output from ASR processing. If the tagging indicate completion of a logical statement, an endpoint may be declared, or a threshold for silent frames prior to declaring an endpoint may be adjusted.
申请公布号 US9437186(B1) 申请公布日期 2016.09.06
申请号 US201313921671 申请日期 2013.06.19
申请人 AMAZON TECHNOLOGIES, INC. 发明人 Liu Baiyang;Secker-Walker Hugh Evan;Rosen Alexander David
分类号 G10L15/00;G10L15/05;G10L15/22;G10L15/19 主分类号 G10L15/00
代理机构 Seyfarth Shaw LLP 代理人 Seyfarth Shaw LLP ;Barzilay Ilan N.;Miller Cyrus A.
主权项 1. A method for reducing latency in speech recognition, the method comprising: receiving audio input data representing an utterance; performing automatic speech recognition (ASR) processing on the audio input data to generate ASR output; determining a first ending to the utterance in the audio input data at a first time corresponding to non-speech detected in the audio input data; determining a first portion of the ASR output, the first portion corresponding to the audio input data up to the first ending; providing the first portion of the ASR output to a natural language understanding (NLU) module to obtain a first NLU result; storing the first NLU result; determining a second ending to the user's speech in the audio input data at a second time after the first time; determining a second portion of the ASR output, the second portion corresponding to the audio input data up to the second ending; comparing the first portion to the second portion; and: (1) if the first portion is the same as the second portion, initiating a first action to be executed on a first device, the first action based on the first NLU result, and(2) if the first portion is not the same as the second portion: discarding the first NLU result,providing the second ASR output to the NLU module to obtain a second NLU result, andinitiating a second action to be executed on the first device, the second action based on the second NLU result.
地址 Reno NV US