发明名称 DETECTING A USER'S VOICE ACTIVITY USING DYNAMIC PROBABILISTIC MODELS OF SPEECH FEATURES
摘要 Method of detecting voice activity starts with by generating probabilistic models that respectively model features of speech dynamically over time. Probabilistic models may model each feature dependent on a past feature and a current state. Features of speech may include a nonstationary signal presence feature, a periodicity feature, and a sparsity feature. Noise suppressor may then perform noise suppression on an acoustic signal to generate a nonstationary signal presence signal and a noise suppressed acoustic signal. An LPC module may then perform residual analysis on the noise suppressed data signal to generate a periodicity signal and a sparsity signal. Inference generator receives the probabilistic models and receives, in real-time, nonstationary signal presence signal, periodicity signal, and sparsity signal. Inference generator may then generate in real time an estimate of voice activity based on the probabilistic models, nonstationary signal presence signal, periodicity signal, and sparsity signal. Other embodiments are also described.
申请公布号 US2015348572(A1) 申请公布日期 2015.12.03
申请号 US201414502795 申请日期 2014.09.30
申请人 Apple Inc. 发明人 Thornburg Harvey D.;Clark Charles P.
分类号 G10L25/84;G10L19/04;G10L15/20 主分类号 G10L25/84
代理机构 代理人
主权项 1. A method of detecting a user's voice activity comprising: generating by a speech features model generator probabilistic models that respectively model features of speech dynamically over time, wherein the probabilistic models model each feature dependent on a past feature and a current state, wherein the features of speech include a nonstationary signal presence feature, a periodicity feature, and a sparsity feature; performing noise suppression by a noise suppressor on an acoustic signal to generate a nonstationary signal presence signal and a noise suppressed acoustic signal; performing by a Linear Predictive Coding (LPC) module residual analysis on the noise suppressed data signal to generate a periodicity signal and a sparsity signal; receiving by an inference generator the probabilistic models and in real-time, the nonstationary signal presence signal, the periodicity signal, and the sparsity signal; and generating by the inference generator in real time an estimate of voice activity based on the probabilistic models, the nonstationary signal presence signal, the periodicity signal, and the sparsity signal.
地址 Cupertino CA US