摘要 |
A client/server text-to-speech synthesis system and method divides the method optimally between client and server. The server stores large databases for pronunciation analysis, prosody generation, and acoustic unit selection corresponding to a normalized text, while the client performs computationally intensive decompression and concatenation of selected acoustic units to generate speech. The units are transmitted from the client to the server in a highly compressed format, with a compression method selected based on the predetermined set of potential acoustic units. This compression method allows for very high-quality and natural-sounding speech to be output at the client machine.
|