发明名称 Smoothening the information density of spoken words in an audio signal
摘要 A portion of an audio signal is identified corresponding to a spoken word and its phonemes. A set of alternate spoken words satisfying phonetic similarity criteria to the spoken word is generated. A subset of the set of alternate spoken words is also identified; each member of the subset shares the same phoneme in a similar temporal position as the spoken word. A significance factor is then calculated for the phoneme based on the number of alternates in the subset and on the total number of alternates. The calculated significance factor may then be used to lengthen or shorten the temporal duration of the phoneme in the audio signal according to its significance in the spoken word.
申请公布号 US9293150(B2) 申请公布日期 2016.03.22
申请号 US201314025323 申请日期 2013.09.12
申请人 International Business Machines Corporation 发明人 Boegelund Flemming;Varshney Lav R.
分类号 G10L21/057;G10L25/60;G10L15/02 主分类号 G10L21/057
代理机构 代理人 Lowry Penny L.;Ray Jeanine
主权项 1. A method for modifying an audio signal, the method comprising: receiving an audio signal, the received audio signal having an original temporal duration; identifying a word portion of the audio signal, the word portion corresponding to a spoken word; identifying a plurality of phonemes in the word portion, a first phoneme of the plurality of phonemes occupying a temporal position in the word portion, the first phoneme having a first temporal duration in the audio signal; generating a set of alternates, each alternate in the set corresponding to an alternate spoken word satisfying phonetic similarity criteria when compared to the spoken word, the set containing a total number of alternates; identifying a subset of alternates from the set of alternates, the first phoneme occupying the temporal position in each alternate in the subset, the subset containing a subset number of alternates; calculating a first significance factor for the first phoneme, the first significance factor based on a proportion of the subset number of alternates to the total number of alternates; modifying the first temporal duration of the first phoneme based on the first significance factor; and outputting the audio signal, the output audio signal including the word portion, the word portion including the first phoneme with the modified first temporal duration, the output audio signal having a modified temporal duration different from the original temporal duration.
地址 Armonk NY US