发明名称 Classifier-based non-linear projection for continuous speech segmentation
摘要 A method segments an audio signal including frames into non-speech and speech segments. First, high-dimensional spectral features are extracted from the audio signal. The high-dimensional features are then projected non-linearly to low-dimensional features that are subsequently averaged using a sliding window and weighted averages. A linear discriminant is applied to the averaged low-dimensional features to determine a threshold separating the low-dimensional features. The linear discriminant can be determined from a Gaussian mixture or a polynomial applied to a bi-model histogram distribution of the low-dimensional features. Then, the threshold can be used to classify the frames into either non-speech or speech segments. Speech segments having a very short duration can be discarded, and the longer speech segments can be further extended. In batch-mode or real-time the threshold can be updated continuously.
申请公布号 US2004015352(A1) 申请公布日期 2004.01.22
申请号 US20020196768 申请日期 2002.07.17
申请人 RAMAKRISHNAN BHIKSHA;SINGH RITA 发明人 RAMAKRISHNAN BHIKSHA;SINGH RITA
分类号 G10L11/02;(IPC1-7):G10L15/12 主分类号 G10L11/02
代理机构 代理人
主权项
地址