发明名称 Utterance selection for automated speech recognizer training
摘要 Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a set of training utterances. The methods, systems, and apparatus include actions of obtaining a target multi-dimensional distribution of characteristics in an initial set of candidate utterances and selecting a subset of the initial set of candidate utterances based on speech recognition confidence scores associated with the candidate utterances. Additional actions include selecting a particular candidate utterance from the subset of the initial set of utterances and determining that adding the particular candidate utterance to a set of training utterances reduces a divergence of a multi-dimensional distribution of the characteristics in the set of training utterances from the target multi-dimensional distribution. Further actions include adding the particular candidate utterance to the set of training utterances.
申请公布号 US9263033(B2) 申请公布日期 2016.02.16
申请号 US201414314295 申请日期 2014.06.25
申请人 Google Inc. 发明人 Siohan Olivier;Mengibar Pedro J.
分类号 G10L15/00;G10L15/06 主分类号 G10L15/00
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method comprising: obtaining a target multi-dimensional distribution of characteristics in an initial set of candidate utterances; selecting a subset of the initial set of candidate utterances based on speech recognition confidence scores associated with the candidate utterances; selecting a particular candidate utterance from the subset of the initial set of utterances; determining that adding the particular candidate utterance to a set of training utterances reduces a divergence of a multi-dimensional distribution of the characteristics in the set of training utterances from the target multi-dimensional distribution; and adding the particular candidate utterance to the set of training utterances.
地址 Mountain View CA US