摘要 |
A method of training a TTS or other system to assign intonational features, such as intonational phrase boundaries, is described. The method of training involves taking a set of predetermined text (not speech or a signal representative of speech) and having a human annotate it with intonational feature annotations. This results in annotated text. Next, the structure of the set of predetermined text is analyzed to generate information. This information is used, along with the intonational feature annotations, to generate a statistical representation. The statistical representation may then be stored and repeatedly used to generate synthesized speech from new sets of input text without training the TTS system further. |