发明名称 Training and applying prosody models
摘要 Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
申请公布号 US8856008(B2) 申请公布日期 2014.10.07
申请号 US201314030248 申请日期 2013.09.18
申请人 Morphism LLC 发明人 Stephens, Jr. James H.
分类号 G10L13/08;G10L13/00;G10L15/04;G10L13/06;G10L15/26;G10L17/00;G10L21/06;G10L13/10;G10L15/18 主分类号 G10L13/08
代理机构 Terrile, Cannatti, Chambers & Holland, LLP 代理人 Terrile, Cannatti, Chambers & Holland, LLP ;Holland Robert W.
主权项 1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising: generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation; training prosody models with lexicons based on first segments of the texts with the prosody information; maintaining an inventory of the prosody models with lexicons, selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models; applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text; updating the associated prosody models' lexicons based on the phrases in the second segments of text; analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and synthesizing audible speech from the second segments of text and the reconciled prosody annotations.
地址 Austin TX US