发明名称 Methods, apparatus and computer programs for automatic speech recognition
摘要 An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts. One such system uses confidence scores to select prompts for targetted recognition training—encouraging input of sounds identified as having low confidence scores. Another system selects prompts to discourage input of sounds that were not easily recognized.
申请公布号 US9502024(B2) 申请公布日期 2016.11.22
申请号 US201414191176 申请日期 2014.02.26
申请人 Nuance Communications, Inc. 发明人 Pickering John Brian;Poultney Timothy David;Staniford Benjamin Terrick;Whitbourne Matthew
分类号 G10L15/00;G10L15/08 主分类号 G10L15/00
代理机构 Wolf, Greenfield & Sacks, P.C. 代理人 Wolf, Greenfield & Sacks, P.C.
主权项 1. A method for use with an automatic speech recognition (ASR) system, the ASR system comprising at least one model having a plurality of representations of phones, the method comprising acts of: receiving an audio signal comprising a first user input; analyzing the audio signal to identify at least one first phone as having a selected recognition performance characteristic, wherein the act of analyzing comprises comparing one or more sounds in the audio signal with a representation of the at least one first phone in the at least one model; subsequent to receiving and analyzing the audio signal comprising the first user input, selecting a user prompt to be presented to a user of a speech-responsive application to elicit a second user input, wherein the speech-responsive application is programmed to perform at least one action based on the second user input, and wherein the user prompt is selected based on a determination of whether the user is expected to speak, in response to the user prompt, the at least one first phone which is identified as having the selected recognition performance characteristic; causing the user prompt to be presented to the user of the speech-responsive application, wherein: the at least one first phone is associated with a first confidence score, the first confidence score being lower than a selected confidence threshold, andthe user prompt is selected to invite the user to speak an input phrase that combines a first word with one or more second words, wherein the first word comprises the at least one first phone associated with the first confidence score that is lower than the selected confidence threshold, and the one or more second words comprise at least one second phone that is associated with a second confidence score, the second confidence score being higher than the selected confidence threshold; and performing, by the speech-responsive application, the at least one action based on the second user input.
地址 Burlington MA US