发明名称 Method and apparatus for generating synthetic speech with contrastive stress
摘要 Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
申请公布号 US8914291(B2) 申请公布日期 2014.12.16
申请号 US201314035550 申请日期 2013.09.24
申请人 Nuance Communications, Inc. 发明人 Meyer Darren C.;Springer Stephen R.
分类号 G10L13/08;G10L13/02;G10L13/10 主分类号 G10L13/08
代理机构 Wolf, Greenfield & Sacks, P.C. 代理人 Wolf, Greenfield & Sacks, P.C.
主权项 1. A method for providing speech output for a speech-enabled application, the method comprising: receiving from the speech-enabled application a text input comprising a text transcription of a desired speech output; identifying at least one first portion of at least one token of the text input that differs from at least one corresponding first portion of at least one other token of the text input, and at least one second portion of the at least one token that does not differ from at least one corresponding second portion of the at least one other token; assigning contrastive stress to the identified at least one first portion of the at least one token, but not to the identified at least one second portion of the at least one token; generating, using at least one computer system, an audio speech output corresponding to at least a portion of the text input, the audio speech output comprising at least one portion carrying contrastive stress corresponding to the at least one first portion of the at least one token of the text input, to contrast with at least one other portion of the audio speech output corresponding to the at least one corresponding first portion of the at least one other token of the text input; and providing the audio speech output for the speech-enabled application.
地址 Burlington MA US