发明名称 Identifying speech portions of a sound model using various statistics thereof
摘要 Speech portions of a sound model may be identified using various statistics associated with the sound model for voice enhancement of noisy audio signals. A spectral motion transform may be performed on an input signal to obtain a linear fit in time of a sound model of the input signal. Statistics may be extracted from the linear fit of the sound model of the input signal. Speech portions of the linear fit of the sound model of the input signal may be identified by detecting a presence of harmonics as a function of time in the linear fit of the sound model of the input signal based on individual ones of the extracted statistics. An output signal may be provided that conveys audio comprising a reconstructed speech component of the input signal with a noise component of the input signal being suppressed.
申请公布号 US9058820(B1) 申请公布日期 2015.06.16
申请号 US201313899264 申请日期 2013.05.21
申请人 The Intellisis Corporation 发明人 Mascaro Massimo;Bradley David C.
分类号 G10L19/00;G10L21/0272;G10L15/187;G10L15/26 主分类号 G10L19/00
代理机构 Arnold & Porter LLP 代理人 Arnold & Porter LLP
主权项 1. A system configured to perform voice enhancement on noisy audio signals, the system comprising: one or more processors configured to execute computer program modules, the computer program modules comprising: a preprocessing module configured to segment an input signal into discrete successive time windows, the input signal conveying audio comprising a speech component superimposed on a noise component, the time windows including a first time window;a transform module configured to perform a transform on individual time windows of the input signal to obtain corresponding sound models of the input signal in the individual time windows, a first sound model being a mathematical representation of harmonics in the first time window of the input signal;a statistics extraction module configured to determine statistics associated with the sound models of individual time windows of the input signal, the statistics including a statistic for the first time window extracted from the first sound model;a time-striping module configured to determine probabilities that the portions of the speech component represented by the input signal in the individual time windows are vocalized portions or non-vocalized portions based on the statistics determined for the sound models individual time windows;a vocalized speech module configured to process sound models of time windows in which the speech component represented by the input signal is classified as vocalized portions based on the probabilities, the first sound model being processed by the vocalized speech module responsive to the speech component represented by the input signal during the first time window being classified as a vocalized portion based on a corresponding probability; anda non-vocalized speech module configured to process sound models of time windows in which the speech component represented by the input signal is classified as non-vocalized portions based on the probabilities, the first sound model being processed by the non-vocalized speech module responsive to the speech component represented by the input signal during the first time window being classified as a non-vocalized portion based on the corresponding probability.
地址 San Diego CA US