发明名称 |
Unsupervised training method, training apparatus, and training program for an N-gram language model based upon recognition reliability |
摘要 |
A computer-based, unsupervised training method for an N-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results. |
申请公布号 |
US9536518(B2) |
申请公布日期 |
2017.01.03 |
申请号 |
US201514643316 |
申请日期 |
2015.03.10 |
申请人 |
International Business Machines Corporation |
发明人 |
Itoh Nobuyasu;Kurata Gakuto;Nishimura Masafumi |
分类号 |
G10L15/06;G10L15/183;G10L15/197;G10L15/18 |
主分类号 |
G10L15/06 |
代理机构 |
|
代理人 |
Dobson Scott S. |
主权项 |
1. An unsupervised training system for an N-gram language model, comprising:
a processor configured to: read recognition results obtained as a result of speech recognition of speech data; acquire a reliability for each of the read recognition results; refer to each recognition result's acquired reliability to select a subset of one or more N-gram entries based upon their respective reliabilities; and train an N-gram language model for one of more entries of the subset of N-gram entries using all recognition results, wherein the processing device is further configured to select from a first corpus, a second corpus, and a third corpus, each of the N-gram entries, whose sum of a first number of appearances in the first corpus as a set of all the recognition results, a second number of appearances in a second corpus as a subset of the recognition results with the reliability higher than or equal to a predetermined threshold value, and a third number of appearances in the third corpus as a baseline of the N-gram language model exceeds a predetermined number of times, where each of the first number of appearances, the second number of appearances, and the third number of appearances is given a different weight, respectively. |
地址 |
Armonk NY US |