摘要 |
<p>PROBLEM TO BE SOLVED: To accurately model a word containing at least one triphone which is not generated during a training period. SOLUTION: When a spoken text is recognized, the spoken text need to be made useful in the form of a series of reference values. The reference values are determined from a known text for the training period and characteristic values are taken out for a regular period. For the recognition period, those characteristic values are arrayed according to triphones 43 to form groups 40 to 41 or clusters. The groups 40 to 42 form the base of the reference values. When a recognition system includes an extremely large vocabulary, all triphones 43 are not generated for the training period if the text is inhibitively long. To determine a reference value as to a word which contains ungenerated triphone 43, the ungenerated triphone 43 needs to be related to the groups 40 to 42, which are all tested to determine whether the same center phoneme as the triphone 43 to be related is present or not together with its left-side and right-side phonemes.</p> |