摘要 |
<p>A time series of spectral parameters is extracted from a learning speech, the spectral parameters are divided into a plurality of segments for each voice interval, and the segments are clustered into a plurality of clusters. For each cluster an initial reference pattern representing the cluster is computed. The segment boundaries are corrected using the computed reference patterns (a correcting step), the segments of the corrected spectral parameter time series are clustered (a clustering step), and for each cluster, a reference pattern representing the cluster is computed (a reference pattern computing step). The correcting step, the clustering step, and the reference pattern computing step are performed at least once, and the reference patterns obtained by the last reference pattern computing step are regarded as reference patterns desired to be obtained.</p> |