发明名称 Speech detection fusing multi-class acoustic-phonetic, and energy features
摘要 A speech detection system extracts a plurality of features from multiple input streams. In the acoustic model space, the tree of Gaussians in the model is pruned to include the active states. The Gaussians are mapped to Hidden Markov Model states for Viterbi phoneme alignment. Another feature space, such as the energy feature space is combined with the acoustic feature space. In the feature space, the features are combined and principal component analysis decorrelates the features to fewer dimensions, thus reducing the number of features. The Gaussians are also mapped to silence, disfluent phoneme, or voiced phoneme classes. The silence class is true silence and the voiced phoneme class is speech. The disfluent class may be speech or non-speech. If a frame is classified as disfluent, then that frame is re-classified as the silence class or the voiced phoneme class based on adjacent frames.
申请公布号 US2007033042(A1) 申请公布日期 2007.02.08
申请号 US20050196698 申请日期 2005.08.03
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MARCHERET ETIENNE;VISWESWARIAH KARTHIK
分类号 G10L15/00 主分类号 G10L15/00
代理机构 代理人
主权项
地址