发明名称 Detection of end of utterance in speech recognition system
摘要 The present invention relates to speech recognition systems, especially to arranging detection of end-of utterance in such systems. A speech recognizer of the system is configured to determine whether recognition result determined from received speech data is stabilized. The speech recognizer is configured to process values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes. Further, the speech recognizer is configured to determine whether end of utterance is detected or not, based on the processing, if the recognition result is stabilized.
申请公布号 US9117460(B2) 申请公布日期 2015.08.25
申请号 US200410844211 申请日期 2004.05.12
申请人 Core Wireless Licensing S.A.R.L. 发明人 Lahti Tommi
分类号 G10L15/00;G10L15/04;G10L25/87 主分类号 G10L15/00
代理机构 Borden Ladner Gervais LLP 代理人 Borden Ladner Gervais LLP
主权项 1. A system comprising a speech recognizer with end of utterance detection, wherein the speech recognizer is configured to calculate values of state scores and token scores associated with frames of received speech data, the speech recognizer is configured to determine best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes, the speech recognizer is configured to, at each received frame of received speech data, determine whether recognition result determined from received speech data is stabilized, if the recognition result determined from received speech data is not stabilized at a current frame, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame, if the recognition result determined from speech data is stabilized at the current frame, the speech recognizer is configured to, in place of continuing speech processing for the next received frame, process values of the determined best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, and on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not, if the end of utterance is not detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame, and if the end of utterance is detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to end the speech processing.
地址 Luxembourg LU