发明名称 Information processing device, large vocabulary continuous speech recognition method and program including hypothesis ranking
摘要 System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.
申请公布号 US9165553(B2) 申请公布日期 2015.10.20
申请号 US201313744963 申请日期 2013.01.18
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Kurata Gakuto;Suzuki Masayuki;Nishimura Masafumi
分类号 G10L15/04;G10L15/10;G10L15/02 主分类号 G10L15/04
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Zarick, Esq. Gail H.
主权项 1. A large vocabulary continuous speech recognition method executed by a computer; the method comprises the steps of: (a) acquiring by said computer a speech data as input; (b) performing by said computer speech recognition with respect to said acquired speech data, and outputting a plurality of hypotheses that are a recognition result with a plurality of speech recognition scores, each speech recognition score being a score indicating apparent correctness of a speech recognition result for each hypothesis; (c) calculating by said computer a structure score for each hypothesis, the structure score being obtained by, for all pairs of phonemes consisting of the hypothesis, multiplying a likelihood of inter-distribution distance of a pair of phonemes by weighting for said pair of phonemes and performing summation; and (d) determining by said computer a total value of said structure score and said speech recognition score for each hypothesis, and based on said total value, ranking said plurality of hypotheses; wherein, in said step (c), said pair-by-pair weightings of said phoneme pairs are set such that weightings between pairs of vowel sounds and weightings of pairs relating to silence are set higher than weightings concerning pairs of other phonemes.
地址 Armonk NY US