摘要 |
PROBLEM TO BE SOLVED: To stably extract a speaker's feature without depending on the contents of utterance data. SOLUTION: A frequency function estimation part 5 is provided with a phonemic limit estimation part, a frequency measuring part, and a mode extracting part. During learning, the frequency function estimation part 5 estimates phonemic limit information, and performs the maximum likelihood estimation of the coefficient α of the frequency warping function f of each sample about a phonemic section selected on the basis of the phonemic limit information. Furthermore, the frequency function estimation part 5 determines a distribution function H (α) which represents a frequency distribution about the coefficient αof each sample, and estimates a coefficient a which provides a mode value as the optimal coefficient of the frequency warping function f. Consequently, a correct frequency warping function f can be estimated even when a plurality of peaks are present in the frequency distribution, and the speaker's feature can stably be extracted without depending on the contents of utterance data. |