摘要 |
The present invention provides a method and apparatus for training a prosody statistic model and prosody parsing, a method and system for text to speech synthesis. Said method for training a prosody statistic model with a raw corpus that includes a plurality of sentences with punctuation, comprising: transforming said plurality of sentences in said raw corpus into a plurality of token sequences respectively; counting a frequency for each adjacent token pair occurring in said plurality of token sequences and frequencies of punctuation that represents a pause occurring at associated positions of said each token pair; calculating pause probabilities at said associated positions of said each token pair; and constructing said prosody statistic model based on said token pairs and said pause probabilities at associated positions thereof. With the present invention a prosody statistic model can be trained from a raw corpus without manually prosody parsing tags. And the prosody statistic model can be used in the prosody parsing and further voice synthesis.
|