摘要 |
A method of performing speech synthesis that includes comparing a text segment (120) with an utterance waveform corpus (60) that contains numerous speech samples (140). The method determines whether there is a contextual best match between the text segment (120) and one speech sample (140). If there is not a contextual best match, the method determines whether there is a contextual phonetic hybrid match between the text segment (120) and a speech sample (140). A contextual phonetic hybrid match requires a match of all implicit prosodic features (210) in a defined prosodic feature group (220). If a match is still not found, the prosodic feature group (220) is redefined by deleting one of the implicit prosodic features (210) from the prosodic feature group (220). The prosodic feature group (220) is successively redefined by deleting one implicit prosodic feature (210) from the group (220) until a match is found between the input text segment (120) and a speech sample (140). When a match is found, the matched speech sample (140) is used to generate concatenative speech (110).
|