An arrangement is provided for text to speech processing based on linguistic prosodic models. Linguistic prosodic models are established to characterize different linguistic prosodic characteristics. When an input text is received, a target unit sequence is generated with a linguistic target that annotates target unit in the target unit sequence with a plurality of linguistic prosodic characteristics so that speech synthesized in accordance with the target unit sequence and the linguistic target has certain desired prosodic properties. A unit sequence is selected in accordance with the target unit sequence and the linguistic target based on joint cost information evaluated using established linguistic prosodic models. The selected unit sequence is used to produce synthesized speech corresponding to the input text.
申请公布号
WO2004070701(A2)
申请公布日期
2004.08.19
申请号
WO2004US02503
申请日期
2004.01.29
申请人
SCANSOFT, INC.;PHILLIPS, MICHAEL, STUART;FAULKNER, DANIEL, STUART;PRZEZDZIECKI, MAREK, ANDRZEJ
发明人
PHILLIPS, MICHAEL, STUART;FAULKNER, DANIEL, STUART;PRZEZDZIECKI, MAREK, ANDRZEJ