发明名称 Efficient empirical determination, computation, and use of acoustic confusability measures
摘要 Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.
申请公布号 US8959019(B2) 申请公布日期 2015.02.17
申请号 US200711932122 申请日期 2007.10.31
申请人 Promptu Systems Corporation 发明人 Printz Harry;Chittar Narren
分类号 G10L15/06;G10L15/08;G10L17/26;G06Q30/02;G06F17/30 主分类号 G10L15/06
代理机构 Perkins Coie LLP 代理人 Glenn Michael A.;Perkins Coie LLP
主权项 1. A method for determining an empirically derived acoustic confusability measure, comprising the steps of: using a computer for performing corpus processing by initially processing an original corpus, comprising both audio information and a true transcription thereof, with an automatic speech recognition system of interest once, one utterance at a time to produce a recognized corpus comprising a machine transcription of audio information; and developing a family of phoneme confusability models by repeatedly processing said recognized corpus with said computer, after the corpus is initially processed by said automatic speech recognition system once, wherein each repetition comprises the steps of: setting all phoneme pair counts to zero; andanalyzing to analyze each pair of phoneme sequences in said recognized corpus to collect information regarding the confusability of any two phonemes, wherein said information is collected by: constructing a lattice from each said pair of phoneme sequences;labeling each arc of the lattice with the appropriate value from the current family of decoding costs;computing the minimum cost path through this lattice; andtraversing said minimum cost path and incrementing the phoneme pair count for each arc that is traversed; and upon completion for each said pair of phoneme sequences of said minimum cost path traversal and associated incrementing of phoneme pair counts, using said accumulated phoneme pair counts to deliver a family of phoneme confusability models.
地址 Menlo Park CA US