发明名称 |
Classifier-based non-linear projection for continuous speech segmentation |
摘要 |
A method segments an audio signal including frames into non-speech and speech segments. First, high-dimensional spectral features are extracted from the audio signal. The high-dimensional features are then projected non-linearly to low-dimensional features that are subsequently averaged using a sliding window and weighted averages. A linear discriminant is applied to the averaged low-dimensional features to determine a threshold separating the low-dimensional features. The linear discriminant can be determined from a Gaussian mixture or a polynomial applied to a bi-model histogram distribution of the low-dimensional features. Then, the threshold can be used to classify the frames into either non-speech or speech segments. Speech segments having a very short duration can be discarded, and the longer speech segments can be further extended. In batch-mode or real-time the threshold can be updated continuously.
|
申请公布号 |
US2004015352(A1) |
申请公布日期 |
2004.01.22 |
申请号 |
US20020196768 |
申请日期 |
2002.07.17 |
申请人 |
RAMAKRISHNAN BHIKSHA;SINGH RITA |
发明人 |
RAMAKRISHNAN BHIKSHA;SINGH RITA |
分类号 |
G10L11/02;(IPC1-7):G10L15/12 |
主分类号 |
G10L11/02 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|