发明名称 Parametric speech synthesis method and system
摘要 The present invention provides a parametric speech synthesis method and a parametric speech synthesis system. The method comprises sequentially processing each frame of speech of each phone in a phone sequence of an input text as follows: for a current phone, extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the current frame of the current phone as rough values of currently predicted speech parameters; according to the rough values and information about a predetermined number of speech frames occurring before the current time point, obtaining smoothed values of the currently predicted speech parameters; according to global mean values and global standard deviation ratios of the speech parameters obtained through statistics, performing global optimization on the smoothed values of the speech parameters to generate necessary speech parameters; and synthesizing the generated speech parameters to obtain a frame of speech synthesized for the current frame of the current phone. With this solution, the capacity of an RAM needed by speech synthesis will not increase with the length of the synthesized speech, and the time length of the synthesized speech is no longer limited by the RAM.
申请公布号 US8977551(B2) 申请公布日期 2015.03.10
申请号 US201113640562 申请日期 2011.10.27
申请人 Goertek Inc. 发明人 Wu Fengliang;Wu Zhenhua
分类号 G10L13/00;G10L13/08;G10L21/00;G10L15/00;H04M1/64;G10L15/22 主分类号 G10L13/00
代理机构 Troutman Sanders LLP 代理人 Troutman Sanders LLP
主权项 1. A parametric speech synthesis method, comprising: analyzing an input text; acquiring a phone sequence based on analysis of the input text, the phone sequence including a plurality of speech frames; synthesizing the phone sequence by synthesizing the plurality of speech frames in a sequential manner, each speech frame being synthesized by performing the following iteration; extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the speech frame as rough values for predicting speech parameters of the speech frame; according to the rough values and information about a predetermined number of preceding speech frames, filtering the rough values to obtain smoothed values for predicting speech parameters of the speech frame; according to global mean values and global standard deviation ratios of speech parameters obtained through statistics, performing global optimization on the smoothed values to generate speech parameters of the speech frame, wherein the global optimization comprises the global mean values and global standard deviation ratios being fixed values using the same values for adjustment in each speech synthesis process without the need of recalculating the global mean and the standard deviation ratios in each speech synthesis process; and synthesizing the optimized speech parameters to obtain a frame of speech waveform.
地址 Weifang CN