主权项 |
1. A method of speech synthesis, comprising the steps of:
(a) receiving speech input from a sender; (b) obtaining at least one distinguishing characteristic of the sender from the speech input, wherein the at least one distinguishing characteristic includes conversational information or textual information of the speech input; (c) obtaining baseline characteristics, wherein the baseline characteristics include articulation rate, courteousness, formants, or pitch frequency that a recipient user of the system is accustomed to hearing; (d) selecting a default text-to-speech model based on the at least one distinguishing characteristic of the sender; (e) modifying the selected default text-to-speech model using the received speech input; (f) receiving, at a text-to-speech system, a text input sent by the sender; (g) processing, via a processor of the system and the text-to-speech model, the text input responsive to the at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender; (h) identifying baseline characteristics of the synthesized speech; (i) applying an acoustic feature filter to the synthesized speech, wherein the acoustic feature filter is adjusted using the baseline characteristics obtained from the received speech; and (j) communicating the synthesized speech to the recipient user of the system. |