发明名称 Discriminative language model training using a confusion matrix
摘要 Features are disclosed for discriminative training of speech recognition language models. A confusion matrix can be generated from acoustic model training data for use in discriminative training. The confusion matrix can include probabilities for the substitution, insertion, and/or deletion of some or all subword units of a language. Probabilities can be calculated based on the presence or absence of subword units in a processed acoustic model training data audio recording when compared to a correct transcription of the recording. The probabilities can be used to generate erroneous transcriptions in language model training corpora, and the language model can be trained to distinguish the erroneous transcriptions from the correct transcriptions.
申请公布号 US9224386(B1) 申请公布日期 2015.12.29
申请号 US201213531376 申请日期 2012.06.22
申请人 Amazon Technologies, Inc. 发明人 Weber Frederick V.
分类号 G10L15/06;G10L15/08;G10L15/14 主分类号 G10L15/06
代理机构 Knobbe, Martens, Olson & Bear, LLP 代理人 Knobbe, Martens, Olson & Bear, LLP
主权项 1. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, generating a recognition hypothesis for acoustic model training data, wherein the acoustic model training data is associated with a transcription,wherein the recognition hypothesis comprises a sequence of one or more recognized subword units,wherein the transcription comprises a sequence of one or more transcribed subword units,wherein each recognized subword unit of the sequence of recognized subword units is one of a plurality of language subword units, andwherein each transcribed subword unit of the sequence of transcribed subword units is one of the plurality of language subword units;comparing the sequence of one or more recognized subword units to the sequence of one or more transcribed subword units;determining that the recognition hypothesis comprises one or more deletion errors for a first language subword unit;determining that the recognition hypothesis comprises one or more insertion errors for the first language subword unit;calculating an insertion probability for the first language subword unit and a deletion probability for the first language subword unit, wherein the insertion probability is based at least in part on the one or more insertion errors, and wherein the deletion probability is based at least in part on the one or more deletion errors;updating a confusion matrix using the insertion probability and the deletion probability, wherein the confusion matrix comprises a second insertion probability for a second language subword unit of the plurality of language subword units and a second deletion probability for the second language subword unit;updating language model training data to generate updated language model training data comprising a lattice, wherein the lattice comprises a path corresponding to the transcription, and wherein updating the language model training data comprises adding an alternate path to the lattice based at least partly on the confusion matrix; anddiscriminatively training a language model using the updated language model training data, wherein the language model is configured to generate a first score for the path corresponding to the transcription and a second score for the alternate path, and wherein discriminatively training the language model comprises updating the language model to generate, for the path corresponding to the transcription, a score higher than the first score and to generate, for the alternate path, a score lower than the second score.
地址 Seattle WA US