发明名称 HIGH END SPEECH SYNTHESIS
摘要 A guide track based speech synthesis system and method that uses an imitator voice and extracted parameter from the imitator voice to enhance the speech synthesized by conventional approach using the library built from an original voice with performance idiosyncrasies, emotions, and characteristics. The imitator voice reads from an input script to recorded speech in substantially the same way as the original voice. The recorded speech is stored in a guide track. Prior recordings of audio from the original voice are used to build a voice library. Context features and prosodic features are extracted from the guide track and corrected. Spectral features which align with the context features and prosodic features of the guide track are generated from the voice library. The aligned acoustic features are then converted to a speech waveform of an enhanced synthetic voice.
申请公布号 US2016365087(A1) 申请公布日期 2016.12.15
申请号 US201514738556 申请日期 2015.06.12
申请人 GEULAH HOLDINGS LLC 发明人 FREUD STEVEN DAVID
分类号 G10L13/10 主分类号 G10L13/10
代理机构 代理人
主权项 1. A guide track based speech synthesis method for enhancing expressiveness of the speech synthesized from texts with context features and acoustic features extracted from an imitator voice, the method comprising: creating a voice library from an original voice; recording an imitator voice according to input script to form a guide track; extracting at least one context feature from the input script and the guide track; extracting acoustic features, including prosodic features and spectral features from the guide track; aligning the acoustic features towards the at least one context feature; predicting spectral features from the voice library using the at least one context feature and the alignment results; and generating a speech waveform using the spectral features predicted from the voice library and the prosodic features extracted from the guide track.
地址 Los Angeles CA US