摘要 |
A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute,
said method comprising:
inputting text;
dividing said inputted text into a sequence of acoustic units;
selecting a speaker for the inputted text;
selecting a speaker attribute for the inputted text;
converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model; and
outputting said sequence of speech vectors as audio with said selected speaker voice and a selected speaker attribute,
wherein said acoustic model comprises a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, wherein the first and second set of parameters do not overlap, and wherein selecting a speaker voice comprises selecting parameters from the first set of parameters which give the speaker voice and selecting the speaker attribute comprises selecting the parameters from the second set which give the selected speaker attribute. |