发明名称 Speech recognition with combined grammar and statistical language models
摘要 Features are disclosed for performing speech recognition on utterances using a grammar and a statistical language model, such as an n-gram model. States of the grammar may correspond to states of the statistical language model. Speech recognition may be initiated using the grammar. At a given state of the grammar, speech recognition may continue at a corresponding state of the statistical language model. Speech recognition may continue using the grammar in parallel with the statistical language model, or it may continue using the statistical language model exclusively. Scores associated with the correspondences between states (e.g., backoff arcs) may be determined according to a heuristically or based on test data.
申请公布号 US9449598(B1) 申请公布日期 2016.09.20
申请号 US201314037975 申请日期 2013.09.26
申请人 Amazon Technologies, Inc. 发明人 Rastrow Ariya;Hoffmeister Bjorn;Garimella Sri Venkata Surya Siva Rama Krishna;Prasad Rohit Krishna
分类号 G10L15/00;G10L15/18;G10L15/02;G06F17/20;G06F17/27;G10L15/193;G10L15/197 主分类号 G10L15/00
代理机构 Knobbe, Martens, Olson & Bear, LLP 代理人 Knobbe, Martens, Olson & Bear, LLP
主权项 1. A system comprising: a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: obtain audio data regarding an utterance of a user;initiate speech recognition on the audio data using a grammar of a composite language model, the composite language model comprising the grammar and an n-gram model, wherein the composite language model comprises scores that bias speech recognition performed using the composite language model to use the grammar over the n-gram model, and wherein a first state of the grammar links to a first state of the n-gram model and to a second state of the grammar;generate at least a first portion of automatic speech recognition results using a portion of the grammar up to at least the first state of the grammar;determine a first score using (1) acoustic information derived from the audio data and (2) a first weight associated with a link from the first state of the grammar to the second state of the grammar;determine a second score using (1) acoustic information derived from the audio data and (2) a second weight associated with a link from the first state of the grammar to the first state of the n-gram model;if the first score is greater than the second score, continue speech recognition on the audio data using the grammar by generating a second portion of automatic speech recognition results using the second state of the grammar, wherein the second portion of automatic speech recognition results is based at least in part on the first score;if the second score is greater than the first score, continue speech recognition on the audio data using n-gram model by generating the second portion of automatic speech recognition results using the first state of the n-gram model, wherein the second portion of automatic speech recognition results is based at least in part on the second score; andgenerate automatic speech recognition results based at least on the first portion of automatic speech recognition results and the second portion of automatic speech recognition results.
地址 Seattle WA US