发明名称 Sampling Training Data for an Automatic Speech Recognition System Based on a Benchmark Classification Distribution
摘要 A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.
申请公布号 US2013289989(A1) 申请公布日期 2013.10.31
申请号 US201313745295 申请日期 2013.01.18
申请人 BIADSY FADI;MORENO MENGIBAR PEDRO J.;NAKAJIMA KAISUKE;BIKEL DANIEL MARTIN 发明人 BIADSY FADI;MORENO MENGIBAR PEDRO J.;NAKAJIMA KAISUKE;BIKEL DANIEL MARTIN
分类号 G10L15/06 主分类号 G10L15/06
代理机构 代理人
主权项
地址