摘要 |
To modify the temporal scale of recorded speech, relative stress and relative speaking rate terms are computed for individual sections, or frames, of the speech. These terms are then combined into a single value denoted as audio tension. For a nominal time-scale modification rate, the audio tension is employed to adjust the modification rate of the individual frames of speech in a non-uniform manner, relative to one another. With this approach, compressed speech can be reproduced at a relatively fast rate, while remaining intelligible to the listener.
|