摘要 |
PROBLEM TO BE SOLVED: To perform voice recognition with high recognition performance for a voice in an actual environment. SOLUTION: A noise model is generated from a sound feature quantity of a noise-section signal and a voice feature quantity average is found from the sound feature quantity of the voice section signal. Then a multiplicative noise feature quantity is calculated by using the voice feature quantity average, a clean voice feature quantity average of a clean voice model (sound model structured for each voice unit of a voice picked up in a non-noisy environment), and a noise feature quantity mean of a noise model, which is normalized by using the multiplicative noise feature quantity to generate a normalized noise model. Further, the normalized noise model and clean voice model are put together to generate a normalized noise superposed voice model, which is normalized to structure a normalized noise adaptive model as a sound model. This sound model is collated with the normalized voice component feature quantity obtained by normalizing the sound feature quantity of the voice section signal to find likelihood, and a voice recognition result is obtained, based upon the collation likelihood. COPYRIGHT: (C)2007,JPO&INPIT
|