主权项 |
1. An audio signal processing apparatus utilized in pre-processing of speech recognition, the apparatus comprising:
a likelihood calculation unit, executed by a processor using a program stored in a storage, configured to extract audio features expressing features of a voice signal and a non-voice signal from an audio signal including the voice signal and the non-voice signal, and calculate a likelihood expressing a probability that the voice signal is included in the audio signal; a spectral feature extraction unit, executed by the processor, configured to perform a frequency analysis to the audio signal to extract a spectral feature; a first basis matrix producing unit, executed by the processor, configured to produce a first basis matrix expressing the feature of the non-voice signal using the spectral feature; a second basis matrix producing unit, executed by the processor, configured to specify a component having a high association with the voice signal in the first basis matrix using the likelihood, and exclude the component from the first basis matrix to produce a second basis matrix; and a spectral feature estimation unit, executed by the processor, configured to estimate a spectral feature of the voice signal or a spectral feature of the non-voice signal by performing nonnegative matrix factorization to the spectral feature of the audio signal using the second basis matrix, wherein the spectral feature estimation unit produces a third basis matrix and a first coefficient matrix, which express the feature of the voice signal, by nonnegative matrix factorization in which the second basis matrix is used, and estimates the spectral feature of the voice signal included in the audio signal by a product of the third basis matrix and the first coefficient matrix, and wherein the product is used to separate the voice signal from the audio signal. |