发明名称 Unsupervised training method, training apparatus, and training program for an N-gram language model based upon recognition reliability
摘要 A computer-based, unsupervised training method for an N-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results.
申请公布号 US9536518(B2) 申请公布日期 2017.01.03
申请号 US201514643316 申请日期 2015.03.10
申请人 International Business Machines Corporation 发明人 Itoh Nobuyasu;Kurata Gakuto;Nishimura Masafumi
分类号 G10L15/06;G10L15/183;G10L15/197;G10L15/18 主分类号 G10L15/06
代理机构 代理人 Dobson Scott S.
主权项 1. An unsupervised training system for an N-gram language model, comprising: a processor configured to: read recognition results obtained as a result of speech recognition of speech data; acquire a reliability for each of the read recognition results; refer to each recognition result's acquired reliability to select a subset of one or more N-gram entries based upon their respective reliabilities; and train an N-gram language model for one of more entries of the subset of N-gram entries using all recognition results, wherein the processing device is further configured to select from a first corpus, a second corpus, and a third corpus, each of the N-gram entries, whose sum of a first number of appearances in the first corpus as a set of all the recognition results, a second number of appearances in a second corpus as a subset of the recognition results with the reliability higher than or equal to a predetermined threshold value, and a third number of appearances in the third corpus as a baseline of the N-gram language model exceeds a predetermined number of times, where each of the first number of appearances, the second number of appearances, and the third number of appearances is given a different weight, respectively.
地址 Armonk NY US