摘要 |
<p>A text-to-speech method for use for simulating a plurality of different voice characteristics,
said method comprising:
inputting text;
dividing said inputted text into a sequence of acoustic units;
selecting voice characteristics for the inputted text;
converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein said model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and
outputting said sequence of speech vectors as audio with said selected voice characteristics,
wherein a parameter of a predetermined type of each probability distribution in said selected voice characteristics is expressed as a weighted sum of parameters of the same type, and wherein the weighting used is voice characteristic dependent, such that converting said sequence of acoustic units to a sequence of speech vectors comprises retrieving the voice characteristic dependent weights for said selected voice characteristics, wherein the parameters are provided in clusters, and each cluster comprises at least one sub-cluster, wherein said voice characteristic dependent weights are retrieved for each cluster such that there is one weight per sub-cluster.</p> |