摘要 |
Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.
|