发明名称 Training speech recognition by matching audio segment frequency of occurrence with frequency of words and letter combinations in a corpus
摘要 A method is provided which trains acoustic models in an automatic speech recognizer ("ASR") without explicitly matching decoded scripts with correct scripts from which acoustic training data is generated. In the method, audio data is input and segmented to produce audio segments. The audio segments are clustered into groups of clustered audio segments such that the clustered audio segments in each of the groups have similar characteristics. Also, the groups respectively form audio similarity classes. Then, audio segment probability distributions for the clustered audio segments in the audio similarity classes are calculated, and audio segment frequencies for the clustered audio segments are determined based on the audio segment probability distributions. The audio segment frequencies are matched to known audio segment frequencies for at least one of letters, combination of letters, and words to determine frequency matches, and a textual corpus of words is formed based on the frequency matches. Then, acoustic models of the automatic speech recognizer are trained based on the textual corpus. In addition, the method may receive and cluster video or biometric data, and match such data to the audio data to more accurately cluster the audio segments into the groups of audio segments. Also, an apparatus for performing the method is provided.
申请公布号 US6009392(A) 申请公布日期 1999.12.28
申请号 US19980007478 申请日期 1998.01.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 KANEVSKY, DIMITRI;ZADROZNY, WLODEK WLODZIMIERZ
分类号 G01L9/00;(IPC1-7):G01L9/00 主分类号 G01L9/00
代理机构 代理人
主权项
地址