Method and system for non-parametric voice conversion,申请号US201314069510-传众专利搜索

发明名称	Method and system for non-parametric voice conversion
摘要	A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
申请公布号	US9183830(B2)	申请公布日期	2015.11.10
申请号	US201314069510	申请日期	2013.11.01
申请人	Google Inc.	发明人	Agiomyrgiannakis Ioannis
分类号	G10L15/00;G10L15/04;G10L15/14;G10L13/02;G10L21/003;G10L15/07;G10L13/033;G10L15/26	主分类号	G10L15/00
代理机构	McDonnell Boehnen Hulbert & Berghoff LLP	代理人	McDonnell Boehnen Hulbert & Berghoff LLP
主权项	1. A method comprising: training an source hidden Markov model (HMM) based speech features generator implemented by one or more processors of a system using speech signals of a source speaker, wherein the source HMM based speech features generator comprises a configuration of source HMM state models, each of the source HMM state models having a set of generator-model functions; extracting speech features from speech signals of a target speaker to generate a target set of target-speaker vectors; for each given source HMM state model of the configuration, determining a particular target-speaker vector from among the target set that most closely matches parameters of the set of generator-model functions of the given source HMM; determining a fundamental frequency (F0) transform that speech-adapts F0 statistics of the source HMM based speech features generator to match F0 statistics of the speech of the target speaker; constructing a converted HMM based speech features generator implemented by one or more processors of the system to be the same as the source HMM based speech features generator, but wherein the parameters of the set of generator-model functions of each source HMM state model of the converted HMM based speech features generator are replaced with the determined particular most closely matching target-speaker vector from among the target set; and speech-adapting F0 statistics of the converted HMM based speech features generator using the F0 transform to thereby produce a speech-adapted converted HMM based speech features generator.
地址	Mountain View CA US