发明名称 MINIMUM ERROR RATE TRAINING OF COMBINED STRING MODELS
摘要 A method of making a speech recognition model database is disclosed. The database is formed based on a training string utterance signal and a plurali ty of sets of current speech recognition models. The sets of current speech recognitio n models may include acoustic models, language models, and other knowledge sources. In accordance with an illustrative embodiment of the invention, a set o f confusable string models is generated, each confusable string model comprising speech recognition models from two or more sets of speech recognition models (such as acoustic and language models). A first scoring signal is generated base d on the training string utterance signal and a string model for that utterance, wher ein the string model for the utterance comprises speech recognition models from two or more sets of speech recognition models. One or more second scoring signals are also generated, wherein a second scoring signal is based on the training string utterance signal and a confusable string model. A misrecognition signal is gener ated based on the first scoring signal and the one or more second scoring signals. Cu rrent speech recognition models are modified, based on the misrecognition signal to increase the probability that a correct string model will have a rank order high er than other confusable string models. The confusable string models comprise N-best wor d string models. The first recognizer scoring signal reflects a measure of similar ity between the training string utterance signal and the string model for that utter ance. The second recognizer scoring signal reflects a measure of similarity between th e training string utterance signal and a confusable string model. The misrecogniti on signal reflects a difference of the first scoring signal and a combination of on e or more second scoring signals. The modification of current speech recognition mode ls is accomplished by generating a recognition model modification signal according to a gradient of a function, which function reflects a recognizer score of a traini ng string utterance based on a string model for that utterance and one or more recognizer scores of the training string utterance based on one or more confusab le string models.
申请公布号 CA2126380(C) 申请公布日期 1998.07.07
申请号 CA19942126380 申请日期 1994.06.21
申请人 AMERICAN TELEPHONE AND TELEGRAPH COMPANY 发明人 CHOU, WU;JUANG, BIING-HWANG;LEE, CHIN-HUI
分类号 G10L15/06;G10L15/14;G10L15/28;(IPC1-7):G10L9/00 主分类号 G10L15/06
代理机构 代理人
主权项
地址