发明名称 SPEECH PROCESSING SYSTEM
摘要 A method of deriving speech synthesis parameters from an audio signal, the method comprising: receiving an input speech signal;estimating the position of glottal closure incidents from said audio signal;deriving a pulsed excitation signal from the position of the glottal closure incidents;segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal;processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum;reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter;comparing said reconstructed speech signal with said input speech signal; andcalculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech.
申请公布号 US2014156280(A1) 申请公布日期 2014.06.05
申请号 US201314090379 申请日期 2013.11.26
申请人 Kabushiki Kaisha Toshiba 发明人 Ranniery Maia
分类号 G10L13/047;G10L13/08 主分类号 G10L13/047
代理机构 代理人
主权项 1. A method of deriving speech synthesis parameters from an audio signal, the method comprising: receiving an input speech signal; estimating the position of glottal closure incidents from said audio signal; deriving a pulsed excitation signal from the position of the glottal closure incidents; segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; comparing said reconstructed speech signal with said input speech signal; and calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech.
地址 Minato-ku JP