摘要 |
Techniques for generating voice with predetermined emotion type. In an aspect, semantic content and emotion type are separately specified for a speech segment to be generated. A candidate generation module generates a plurality of emotionally diverse candidate speech segments, wherein each candidate has the specified semantic content. A candidate selection module identifies an optimal candidate from amongst the plurality of candidate speech segments, wherein the optimal candidate most closely corresponds to the predetermined emotion type. In further aspects, crowd-sourcing techniques may be applied to generate the plurality of speech output candidates associated with a given semantic content, and machine-learning techniques may be applied to derive parameters for a real-time algorithm for the candidate selection module. |