发明名称 Document transcription system training
摘要 A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
申请公布号 US9286896(B2) 申请公布日期 2016.03.15
申请号 US201414280041 申请日期 2014.05.16
申请人 MModal IP LLC 发明人 Yegnanarayanan Girija;Finke Michael;Fritsch Juergen;Koll Detlef;Woszczyna Monika
分类号 G10L15/06;G10L15/26;G10L15/193 主分类号 G10L15/06
代理机构 Robert Plotkin, P.C. 代理人 Robert Plotkin, P.C. ;Plotkin Robert
主权项 1. A method for use with a system including a first document containing at least some information in common with a spoken audio stream, the method performed by at least one computer processor executing computer program instructions to perform steps of: (A) identifying text in the first document, wherein the text represents a concept; (B) identifying, based on the identified text and a repository of finite state grammars, a plurality of spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other; (C) replacing the identified text with a finite state grammar specifying the plurality of spoken forms of the concept to produce a second document, wherein the finite state grammar includes the identified text and text other than the identified text; (D) generating a document-specific language model based on the second document, comprising generating at least some of the document-specific language model based on the finite state grammar; and (E) using the document-specific language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document.
地址 Franklin TN US