发明名称 Acoustic model training
摘要 Features are disclosed for generating acoustic models from an existing corpus of data. Methods for generating the acoustic models can include receiving at least one characteristic of a desired acoustic model, selecting training utterances corresponding to the characteristic from a corpus comprising audio data and corresponding transcription data, and generating an acoustic model based on the selected training utterances.
申请公布号 US9495955(B1) 申请公布日期 2016.11.15
申请号 US201313733084 申请日期 2013.01.02
申请人 Amazon Technologies, Inc. 发明人 Weber Frederick Victor;Adams Jeffrey Penrod
分类号 G10L15/00;G10L15/06 主分类号 G10L15/00
代理机构 Knobbe, Martens, Olson & Bear, LLP 代理人 Knobbe, Martens, Olson & Bear, LLP
主权项 1. An acoustic modeling system, comprising: under control of one or more computing devices configured with specific computer-executable instructions, receiving a plurality of characteristics of utterances to be used to create an acoustic model;for each characteristic in the plurality of characteristics: identifying an utterance within a corpus of utterances having the characteristic; andassociating at least a portion of the utterance with a tag indicative of the characteristic;receiving an identification of a desired training utterance, wherein the desired training utterance comprises a first portion associated with a first desired characteristic and a second portion associated with a second desired characteristic, and wherein the desired training utterance is not included in the corpus;selecting, from the corpus, a first utterance, wherein a portion of the first utterance comprises at least the first portion of the desired training utterance, andwherein the portion of the first utterance is associated with a tag corresponding to the first desired characteristic;extracting the portion of the first utterance from the first utterance;selecting, from the corpus, a second utterance, wherein a portion of the second utterance comprises at least the second portion of the desired training utterance, andwherein the portion of the second utterance is associated with a tag corresponding to the second desired characteristic;extracting the portion of the second utterance from the second utterance;concatenating the portion of the first utterance with the portion of the second utterance to generate the desired training utterance; andtraining an acoustic model, wherein: the acoustic model comprises statistical representations of possible sounds of subword units; andthe statistical representations are generated based on a comparison between audio data associated with the desired training utterance that is generated and a textual transcription of the desired training utterance that is generated.
地址 Seattle WA US