发明名称 VOICE GENERATION WITH PREDETERMINED EMOTION TYPE
摘要 Techniques for generating voice with predetermined emotion type. In an aspect, semantic content and emotion type are separately specified for a speech segment to be generated. A candidate generation module generates a plurality of emotionally diverse candidate speech segments, wherein each candidate has the specified semantic content. A candidate selection module identifies an optimal candidate from amongst the plurality of candidate speech segments, wherein the optimal candidate most closely corresponds to the predetermined emotion type. In further aspects, crowd-sourcing techniques may be applied to generate the plurality of speech output candidates associated with a given semantic content, and machine-learning techniques may be applied to derive parameters for a real-time algorithm for the candidate selection module.
申请公布号 US2016071510(A1) 申请公布日期 2016.03.10
申请号 US201414480611 申请日期 2014.09.08
申请人 Microsoft Corporation 发明人 Li Chi-Ho;Wang Baoxun;Leung Max
分类号 G10L13/08;G10L25/63 主分类号 G10L13/08
代理机构 代理人
主权项 1. An apparatus for text-to-speech synthesis comprising: a candidate generation block configured to retrieve a plurality of speech candidates each having semantic content associated with a message; a candidate selection block configured to select one of the plurality of speech candidates corresponding to a specified emotion type; and a speaker for generating an audio output corresponding to the selected one of the plurality of speech candidates.
地址 Redmond WA US