发明名称 Method and apparatus to generate a speech recognition library
摘要 Methods and apparatus to generate a speech recognition library for use by a speech recognition system are disclosed. An example method comprises identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments, computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments, selecting a set of the plurality of audio data segments based on the plurality of difference metrics, identifying a first one of the audio data segments in the set as a representative audio data segment, determining a first phonetic transcription of the representative audio data segment, and adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.
申请公布号 US9536519(B2) 申请公布日期 2017.01.03
申请号 US201514926544 申请日期 2015.10.29
申请人 AT&T INTELLECTUAL PROPERTY I, L.P. 发明人 Chang Hisao
分类号 G10L15/26;G10L15/06;G06F17/27;G10L13/04;G10L13/06;G10L15/187;G10L25/57;G10L13/02;G06F17/21;G10L13/08 主分类号 G10L15/26
代理机构 Guntin & Gust, PLC 代理人 Guntin & Gust, PLC ;Das Atanu
主权项 1. A device, comprising: a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, comprising: obtaining video media content, wherein the video media content comprises images, audio content, and closed captioning of text from the audio content;detecting an occurrence of a textual phrase in the closed captioning data of the video media content as a detected occurrence;obtaining an audio segment from the audio content corresponding to the textual phrase as a selected audio segment;computing a phonetic transcription for the selected audio segment as a computed transcription;selecting, from a speech recognition library, a plurality of identified phonetic transcriptions associated with the textual phrase, wherein the speech recognition library comprises audio pronunciation data for the textual phrase and identified phonetic transcriptions of the textual phrase;comparing the computed transcription with the plurality of identified phonetic transcriptions from the speech recognition library;determining if the computed transcription differs from the plurality of identified phonetic transcriptions from the speech recognition library; andresponsive to determining that the computed transcription differs from the plurality of identified phonetic transcriptions, adding the computed transcription and the textual phrase to the audio pronunciation data in a group of the plurality of identified phonetic transcriptions in the speech recognition library.
地址 Atlanta GA US