发明名称 Hierarchical active voice detection
摘要 One or more audio signals are processed using a multi-stage (hierarchical) voice and/or signal activity detector (VAD/SAD). A first stage is capable of reducing the workload bandwidth by employing an inexpensive VAD/SAD processor. One or more subsequent stages may further process the audio signals from the first stage. Other implementations may include a first stage that also performs continuity preservation between last blocks of audio signal and the first blocks of audio after it is detected that relevant audio signals are resumed. In yet other implementations, the first stage may extract features from audio signals when they are presented in their coded domain, and possibly with little or no decoding of the audio signal.
申请公布号 US9064503(B2) 申请公布日期 2015.06.23
申请号 US201314386304 申请日期 2013.03.21
申请人 Dolby Laboratories Licensing Corporation 发明人 Dickins Glenn N.;Neal Timothy J.;Shue Yen-Liang
分类号 G10L25/78;G10L19/16 主分类号 G10L25/78
代理机构 代理人
主权项 1. A system for processing audio signals, said system comprising: a first stage processor, said first stage processor inputting an audio signal from at least one audio source, wherein said first stage processor is capable of performing preliminary voice or signal activity detection (VAD/SAD) processing upon said audio signal and capable of outputting a first intermediate set of audio signals; wherein said first stage processor is capable of eliminating at least some of the audio signal; and a second stage processor, said second stage processor inputting said first intermediate set of audio signals from said first stage processor, wherein said second stage processor is capable of performing audio processing upon said first intermediate set of audio signals; wherein said second stage processor is capable of performing voice or signal activity detection (VAD/SAD) processing upon said first intermediate set of audio signals; wherein an accuracy for estimating periods of speech or signal activity is higher for the second stage processor than for the first stage processor;wherein said first stage processor is capable of achieving a reduction in bandwidth for the first intermediate set of audio signals which is sent to said second stage processor;wherein said second stage processor is capable of sending a control signal to said first stage processor and wherein said first stage processor is capable of dynamically changing processing according to said control signal; andwherein said control signal indicates to said first stage processor to remain open until said second stage processor detects the end of desired signal activity.
地址 San Francisco CA US