发明名称 Content-based audio playback emphasis
摘要 Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.
申请公布号 US8768706(B2) 申请公布日期 2014.07.01
申请号 US201012859883 申请日期 2010.08.20
申请人 Multimodal Technologies, LLC 发明人 Schubert Kjell;Fritsch Juergen;Finke Michael;Koll Detlef
分类号 G10L21/00 主分类号 G10L21/00
代理机构 Robert Plotkin, P.C. 代理人 Robert Plotkin, P.C.
主权项 1. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium to perform a method comprising: (A) deriving, from a region of a document and a corresponding region of a spoken audio stream, a likelihood score representing a likelihood that the region of the document correctly represents content in the corresponding region of the spoken audio stream, and tangibly storing a representation of the likelihood score in a second computer-readable medium; (B) selecting a relevance score representing a measure of relevance of the region of the spoken audio stream, the measure of relevance representing a measure of importance that the region of the spoken audio stream be brought to the attention of a human proofreader, and tangibly storing a representation of the relevance score in a third computer-readable medium; and (C) deriving, by dividing the relevance score by the likelihood score, an emphasis factor for modifying emphasis placed on the region of the spoken audio stream when played back, and storing a representation of the emphasis factor in a fourth computer-readable medium.
地址 Pittsburgh PA US