摘要 |
A method and system are disclosed that train a text-to-speech synthesis system for use in speech synthesis. The method includes generating a speech database of audio files comprising domain-specific voices having various prosodies, and training a text-to-speech synthesis system using the speech database by selecting audio segments having a prosody based on at least one dialog state. The system includes a processor, a speech database of audio files, and modules for implementing the method. |