摘要 |
PROBLEM TO BE SOLVED: To provide a high-quality synthesized speech which is independent from the structure of input text or a voice database. SOLUTION: The input text is analyzed to obtain a prosody parameter 13 and a phoneme context 14. A speech phoneme candidate search part 5 obtains speech phoneme candidates 15 corresponding to the phoneme context 14, and a deformation prosodeme candidate selection part 6 selects deformation prosodeme candidates 16 excellent in prosody among the speech phoneme candidates 15. A sub-cost 17 is determined for each of the speech phoneme candidates 15, a prosodic deformation sub-cost is also determined for each of the deformation prosodeme candidates 16, and a speech phoneme candidate and a deformation prosodeme candidate having the minimum weighting sum of the sub-costs are selected as a selected speech phoneme 19 and a prosodeme 20 to be deformed. A prosodic deformation part 10 prosodically deforms speech waveform data corresponding to the prosodeme 20 to be deformed, and connects the result (prosodically deformed waveform data 21) with speech waveform data corresponding to the selected speech phoneme 19 to thereby obtain a synthesized speech. COPYRIGHT: (C)2009,JPO&INPIT
|