发明名称 Audio signal processing apparatus and audio signal processing method
摘要 Likelihood calculation means extracts audio features expressing features of a voice signal and a non-voice signal from an acquired audio signal, and calculates likelihood expressing probability that the voice signal is included in the audio signal using the audio features. Spectral feature extraction means performs a frequency analysis to the audio signal to extract a spectral feature. Using the spectral feature, first basis matrix producing means produces a first basis matrix expressing the feature of the non-voice signal. Second basis matrix producing means specifies a component having a high association with the voice signal in the first basis matrix using the likelihood, and excludes the component to produce a second basis matrix. Spectral feature estimation means estimates a spectral feature of the voice signal or a spectral feature of the non-voice signal by performing nonnegative matrix factorization to the spectral feature using the second basis matrix.
申请公布号 US9224392(B2) 申请公布日期 2015.12.29
申请号 US201213420912 申请日期 2012.03.15
申请人 Kabushiki Kaisha Toshiba 发明人 Hirohata Makoto
分类号 G10L21/00;G10L25/90;G10L25/93;G10L21/02;G10L15/20;G10L21/0216 主分类号 G10L21/00
代理机构 Nixon & Vanderhye, P.C. 代理人 Nixon & Vanderhye, P.C.
主权项 1. An audio signal processing apparatus utilized in pre-processing of speech recognition, the apparatus comprising: a likelihood calculation unit, executed by a processor using a program stored in a storage, configured to extract audio features expressing features of a voice signal and a non-voice signal from an audio signal including the voice signal and the non-voice signal, and calculate a likelihood expressing a probability that the voice signal is included in the audio signal; a spectral feature extraction unit, executed by the processor, configured to perform a frequency analysis to the audio signal to extract a spectral feature; a first basis matrix producing unit, executed by the processor, configured to produce a first basis matrix expressing the feature of the non-voice signal using the spectral feature; a second basis matrix producing unit, executed by the processor, configured to specify a component having a high association with the voice signal in the first basis matrix using the likelihood, and exclude the component from the first basis matrix to produce a second basis matrix; and a spectral feature estimation unit, executed by the processor, configured to estimate a spectral feature of the voice signal or a spectral feature of the non-voice signal by performing nonnegative matrix factorization to the spectral feature of the audio signal using the second basis matrix, wherein the spectral feature estimation unit produces a third basis matrix and a first coefficient matrix, which express the feature of the voice signal, by nonnegative matrix factorization in which the second basis matrix is used, and estimates the spectral feature of the voice signal included in the audio signal by a product of the third basis matrix and the first coefficient matrix, and wherein the product is used to separate the voice signal from the audio signal.
地址 Tokyo JP