摘要 |
Word spotting in a speech recognition system without predetermining the endpoints of the input speech. The invention is intended to be implemented in a system which has word templates stored in template memory, with the system being capable of accumulating distance measures for states within each word template. The following steps are used to generate a measure of similarity between a subset of the input frames and a word template. The steps are: a) recording a beginning input frame number for each state to identify the potential beginning of the word; b) accumulating distance measures for at least one state for each input frame; c) normalizing the distance measures by substracting a normalization amount from each distance measure; d) recording normalization information corresponding to the normalization amount for each input frame; and e) determining a similarity measure between the word template and a subset of input frames after a given input frame has been processed. The subset is identified from the beginning input frame number corresponding to an end state of the template, through the given input frame number. The similarity measure is based on the normalized distance measure recorded for the end state. and the normalization information.
|