摘要 |
PURPOSE: To reduce unsupervised segmentation error and to facilitate a succeeding phone model adaptive execution by eliminating an acoustic spectrum fluctuation source casing recognition performance deterioration by decomposing the spectrum fluctuation source. CONSTITUTION: In a training side 10, a spectrum bias (h) is subtracted from a training speech spectrum Xt of the speaker in a logarithmic domain to generate a set of a normalized spectrum, and is made into a model in a process 26 to generate the models M2, M3 of a normalized unspecified speaker. The normalized phone models M2, M3 are supplied to a decoder 30, and are used for decoding the test speech of the speaker (q). Before the speaker (q) recognized a sentence, short generation of a proofreading speech Xc is supplied to an h- estimater 24, and the estimated spectrum bias h<(q)> for speaker is generated, and it is subtracted from the training speech spectrum Xt . A bias parameter generates the normalized spectrum, and the normalized spectrum is supplied to the decoder 30 to constitute a word line. |