摘要 |
A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.
|