发明名称 Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
摘要 Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
申请公布号 US9324330(B2) 申请公布日期 2016.04.26
申请号 US201313853759 申请日期 2013.03.29
申请人 Smule, Inc. 发明人 Chordia Parag;Godfrey Mark;Rae Alexander;Gupta Prerna;Cook Perry R.
分类号 G10L21/055;G10L19/02;G10L19/00;G10H1/36 主分类号 G10L21/055
代理机构 Haynes and Boone, LLP 代理人 Haynes and Boone, LLP
主权项 1. A computational method for transforming an input audio encoding of speech into an output that is rhythmically consistent with a target song, the method comprising: segmenting the input audio encoding of the speech into plural segments, the segments corresponding to successive sequences of samples of the audio encoding and delimited by onsets identified therein; mapping individual ones of the plural segments to respective sub-phrase portions of a phrase template for the target song, the mapping establishing one or more phrase candidates; temporally aligning at least one of the phrase candidates with a rhythmic skeleton for the target song; and preparing a resultant audio encoding of the speech in correspondence with the temporally aligned phrase candidate-mapped from onset-delimited segments of the input audio encoding.
地址 San Francisco CA US