摘要 |
<p>Complex amplitudes of sinusoidal components for a speech synthesiser (eg. a text to speech vocoder) are extracted from an input audio signal by dividing it into frames (eg. windowed, overlapping frames), Fourier transforming each frame into the frequency domain, identifying the peak in each frequency band (eg. each of 21 critical bands), calculating the complex amplitude of a sinusoidal component at this frequency (eg via a MMSE method) and finally assigning this amplitude to a fixed frequency (eg. the centre frequency) in each band (figs. 7A & B). An output signal may then be generated from the sum of these components.</p> |