发明名称 Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language
摘要 A computer implemented method, system and computer program product for evaluating pronunciation. Known phonemes are stored in a computer memory. A spoken utterance corresponding to a target utterance, comprised of a sequence of target phonemes, is received and stored in a computer memory. The spoken utterance is segmented into a sequence of spoken phonemes, each corresponding to a target phoneme. For each spoken phoneme, a relative posterior probability is calculated that the spoken phoneme is the corresponding target phoneme. If the calculated probability is greater than a first threshold, an indication that the target phoneme was pronounced correctly is output; if less than a first threshold, an indication that the target phoneme was pronounced incorrectly is output. If the probability is less than a first threshold and greater than a second threshold, an indication that pronunciation of the target phoneme was acceptable is output.
申请公布号 US8744856(B1) 申请公布日期 2014.06.03
申请号 US201213401483 申请日期 2012.02.21
申请人 Carnegie Speech Company 发明人 Ravishankar Mosur K.
分类号 G10L21/00;G10L15/00;G10L15/04;G10L21/06;G09B19/04;G10L25/60 主分类号 G10L21/00
代理机构 代理人
主权项 1. A computer implemented method for evaluating the pronunciation of an utterance, comprising: a. storing in a computer memory a plurality of known phonemes of a language; b. receiving, via a computer processor, an utterance spoken by a user, the utterance spoken by the user corresponding to a target utterance, the target utterance being comprised of a sequence of one or more target phonemes to be spoken by the user, each of the target phonemes being one of the plurality of known phonemes of the language; c. storing, in a computer memory, the utterance spoken by the user; d. segmenting, via a computer processor, the stored utterance spoken by the user into a sequence of one or more spoken phonemes, each of the spoken phonemes corresponding to one of the target phonemes; e. for each of the one or more spoken phonemes, i. determining, via a computer processor, for each of the plurality of known phonemes in the language, a value that represents the likelihood that the spoken phoneme is one of the plurality of known phonemes in the language;ii. responsive to each of the determined values for the spoken phoneme for each of the plurality of known phonemes in the language, computing, via a computer processor, a posterior probability that the spoken phoneme is one of the plurality of known phonemes;iii. responsive to the computed posterior probabilities for the spoken phoneme for each of the plurality of known phonemes in the language, computing, via a computer processor, a relative posterior probability that the spoken phoneme is the corresponding target phoneme;iv. comparing, via a computer processor, the computed relative posterior probability to a first predetermined threshold for the target phoneme;v. when the computed relative posterior probability is greater than or equal to the first predetermined threshold, outputting, via an output device, an indication that the pronunciation of the target phoneme was correct; andvi. when the computed relative posterior probability is less than the first predetermined threshold, outputting, via an output device, an indication that the pronunciation of the target phoneme was incorrect.
地址 Pittsburg PA US