摘要 |
A system and method for recognizing an utterance of a speech in which each reference pattern stored in a dictionary is constituted by a series of phonemes of a word to be recognized, each phoneme having a predetermined length of continued time and having a series of frames and a lattice point (i, j) of an i-th number phoneme at an j-th number frame having a discriminating score derived from Neural Networks for the corresponding phoneme. When the series of phonemes recognized by a phoneme recognition block is compared with each reference pattern, one i of the input series of phonemes recognized by the phoneme recognition block being calculated as a matching score as gk(i, j); <IMAGE> wherein ak(i, j) denotes an output score value of the Neural Networks of the j-th number phoneme at the j-th number frame of the reference pattern and p denoted a penalty constant to avoid an extreme shrinkage of the phonemes, a total matching score is calculated as gk (I, J), I denoting the number of frames of the input series of phonemes and J denoting the number of phonemes of the reference pattern k, and one of the reference patterns which gives a maximum matching score is output as the word recognition.
|