发明名称 |
System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies |
摘要 |
A method for generating a language component vocabulary VC for a speech recognition system having a language vocabulary V of a plurality of word forms is disclosed. The method includes: partitioning the language vocabulary V into subsets of word forms based on frequencies of occurrence of the respective word forms; and in at least one of the subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components. Also disclosed is a method for use in speech recognition including: splitting an acoustic vocabulary comprising baseforms into baseform components and storing the baseform components; and, performing sound to spelling mapping on the baseform components so as to generate a baseform components to word parts table for use in subsequent decoding of speech. A method for decoding a speech utterance using language model components and acoustic components, includes the steps of: generating from the utterance a stack of baseform component paths; concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in an acoustic vocabulary; mapping the concatenated baseforms into words; computing language model (LM) scores associated with the words using a language model, and performing further decoding of the utterance based thereupon.
|
申请公布号 |
US2005143972(A1) |
申请公布日期 |
2005.06.30 |
申请号 |
US20050064643 |
申请日期 |
2005.02.24 |
申请人 |
GOPALAKRISHNAN PONANI;KANEVSKY DIMITRI;MONKOWSKI MICHAEL D.;SEDIVY JAN |
发明人 |
GOPALAKRISHNAN PONANI;KANEVSKY DIMITRI;MONKOWSKI MICHAEL D.;SEDIVY JAN |
分类号 |
G06F17/27;G10L15/18;(IPC1-7):G06F17/21 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|