发明名称 |
Acoustic model training |
摘要 |
Features are disclosed for generating acoustic models from an existing corpus of data. Methods for generating the acoustic models can include receiving at least one characteristic of a desired acoustic model, selecting training utterances corresponding to the characteristic from a corpus comprising audio data and corresponding transcription data, and generating an acoustic model based on the selected training utterances. |
申请公布号 |
US9495955(B1) |
申请公布日期 |
2016.11.15 |
申请号 |
US201313733084 |
申请日期 |
2013.01.02 |
申请人 |
Amazon Technologies, Inc. |
发明人 |
Weber Frederick Victor;Adams Jeffrey Penrod |
分类号 |
G10L15/00;G10L15/06 |
主分类号 |
G10L15/00 |
代理机构 |
Knobbe, Martens, Olson & Bear, LLP |
代理人 |
Knobbe, Martens, Olson & Bear, LLP |
主权项 |
1. An acoustic modeling system, comprising:
under control of one or more computing devices configured with specific computer-executable instructions,
receiving a plurality of characteristics of utterances to be used to create an acoustic model;for each characteristic in the plurality of characteristics:
identifying an utterance within a corpus of utterances having the characteristic; andassociating at least a portion of the utterance with a tag indicative of the characteristic;receiving an identification of a desired training utterance, wherein the desired training utterance comprises a first portion associated with a first desired characteristic and a second portion associated with a second desired characteristic, and wherein the desired training utterance is not included in the corpus;selecting, from the corpus, a first utterance,
wherein a portion of the first utterance comprises at least the first portion of the desired training utterance, andwherein the portion of the first utterance is associated with a tag corresponding to the first desired characteristic;extracting the portion of the first utterance from the first utterance;selecting, from the corpus, a second utterance,
wherein a portion of the second utterance comprises at least the second portion of the desired training utterance, andwherein the portion of the second utterance is associated with a tag corresponding to the second desired characteristic;extracting the portion of the second utterance from the second utterance;concatenating the portion of the first utterance with the portion of the second utterance to generate the desired training utterance; andtraining an acoustic model, wherein:
the acoustic model comprises statistical representations of possible sounds of subword units; andthe statistical representations are generated based on a comparison between audio data associated with the desired training utterance that is generated and a textual transcription of the desired training utterance that is generated. |
地址 |
Seattle WA US |