发明名称 |
Training and applying prosody models |
摘要 |
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles. |
申请公布号 |
US8856008(B2) |
申请公布日期 |
2014.10.07 |
申请号 |
US201314030248 |
申请日期 |
2013.09.18 |
申请人 |
Morphism LLC |
发明人 |
Stephens, Jr. James H. |
分类号 |
G10L13/08;G10L13/00;G10L15/04;G10L13/06;G10L15/26;G10L17/00;G10L21/06;G10L13/10;G10L15/18 |
主分类号 |
G10L13/08 |
代理机构 |
Terrile, Cannatti, Chambers & Holland, LLP |
代理人 |
Terrile, Cannatti, Chambers & Holland, LLP ;Holland Robert W. |
主权项 |
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation; training prosody models with lexicons based on first segments of the texts with the prosody information; maintaining an inventory of the prosody models with lexicons, selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models; applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text; updating the associated prosody models' lexicons based on the phrases in the second segments of text; analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and synthesizing audible speech from the second segments of text and the reconciled prosody annotations. |
地址 |
Austin TX US |