发明名称 Method and apparatus for providing speech output for speech-enabled applications
摘要 Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.
申请公布号 US9424833(B2) 申请公布日期 2016.08.23
申请号 US201414572451 申请日期 2014.12.16
申请人 Nuance Communications, Inc. 发明人 Meyer Darren C.;Bos-Plachez Corinne;Staessen Martine Marguerite
分类号 G10L13/02;G10L13/04;G10L13/08 主分类号 G10L13/02
代理机构 Wolf, Greenfield & Sacks, P.C. 代理人 Wolf, Greenfield & Sacks, P.C.
主权项 1. A method for providing a speech output for a speech-enabled application, the method comprising: receiving from the speech-enabled application a text input comprising a text transcription of a desired speech output; selecting, using at least one computer system, a sequence of audio recordings for concatenation to produce the desired speech output, the selected sequence of audio recordings comprising a first audio recording for concatenation with one or more other audio recordings in the selected sequence of audio recordings, the first audio recording selected for being of a speaker speaking a plurality of words in the text transcription, wherein selecting the sequence of audio recordings comprises applying one or more selection criteria that favor the selected sequence of audio recordings for being a smaller number of audio recordings than other candidate sequences of audio recordings for producing the desired speech output; generating a speech output by concatenating the selected sequence of audio recordings; and providing the generated speech output for the speech-enabled application.
地址 Burlington MA US