主权项 |
1. A computer-implemented method for creating a text-to-speech voice, the method comprising:
obtaining voice data of a voice recipient, wherein the text-to-speech voice is being created for the voice recipient; determining a voice characteristic of the voice recipient by processing the voice data of the voice recipient; selecting a voice donor from a plurality of voice donors using the voice characteristic by:
determining a voice characteristic for each voice donor of the plurality of voice donors by processing voice data of each voice donor, andcomparing the voice characteristic of the voice recipient with the voice characteristic for each voice donor of the plurality of voice donors; obtaining a first age corresponding to the selected voice donor; obtaining a second age corresponding to the voice recipient; obtaining voice data of the selected voice donor; encoding the voice data of the selected voice donor to obtain a plurality of voice parameter values, wherein the plurality of voice parameter values comprises at least one of vocal tract parameter values, vocal source parameter values, or prosodic parameter values; obtaining a voice-aging model, wherein:
the voice-aging model receives as input (i) input voice parameter values, (ii) an input age corresponding to the input voice parameter values, and (iii) an output age corresponding to output voice parameter values, andthe voice-aging model generates output voice parameter values by transforming the input voice parameter values using the input age and the output age; transforming the plurality of voice parameter values using the voice-aging model, the first age, and the second age to obtain a plurality of transformed voice parameter values; synthesizing transformed voice data using the plurality of transformed parameter values; and creating a text-to-speech voice using the transformed voice data. |