发明名称 Computer-readable medium for recording audio signal processing estimating a selected frequency by comparison of voice and noise frame levels
摘要 A computer implemented method comprising: setting a plurality of frames on a time axis between a first waveform of an input to audio processing and a second waveform of an output from the audio processing, detecting a voice frame and a noise frame in the first and second waveform, calculating a first and second spectrum from the first and second waveform, adjusting level of the first or second spectrum of the noise frame, setting the adjusted first and second spectrum of the noise frame as a third and fourth spectrum, calculating a distortion amount of the noise frame from the third and fourth spectrum, estimating a noise model spectrum from the first or second spectrum, determining a selected frequency by comparison of voice and noise frame spectrum levels, and calculating a distortion amount of the voice frame from the first and second spectrum of the voice frame at the selected frequency.
申请公布号 US9058821(B2) 申请公布日期 2015.06.16
申请号 US200912621918 申请日期 2009.11.19
申请人 FUJITSU LIMITED 发明人 Matsumoto Chikako;Matsuo Naoshi
分类号 G10L21/00;G10L25/69;G10L21/0208;G10L19/005;G10L19/12;G10L21/02 主分类号 G10L21/00
代理机构 Fujitsu Patent Center 代理人 Fujitsu Patent Center
主权项 1. A non-transitory computer-readable medium for recording an audio signal processing estimating program allowing a computer to execute estimation of audio signal processing, the audio signal processing estimating program allowing the computer to execute: setting a plurality of frames each of which has a specific period of time on a common time axis between a first waveform as a time waveform of an input to the audio signal processing and a second waveform as a time waveform of an output from the audio signal processing; detecting, from the plurality of frames, a voice frame as a frame in which a specific voice exists in both of the first waveform and the second waveform, and a noise frame as a frame in which the specific voice does not exist in the first waveform nor the second waveform; calculating a first spectrum corresponding to a spectrum of the first waveform and a second spectrum corresponding to a spectrum of the second waveform for the voice frame and the noise frame; adjusting a level of the first spectrum of the noise frame or the second spectrum of the noise frame so that the level of the first spectrum and the level of the second spectrum in the noise frame are substantially equal to each other, and setting the first spectrum of the noise frame after the level adjustment as a third spectrum of the noise frame while setting the second spectrum of the noise frame after the level adjustment as a fourth spectrum of the noise frame; calculating a distortion amount of the noise frame based on the third spectrum of the noise frame and the fourth spectrum of the noise frame; setting the first spectrum or the second spectrum to a fifth spectrum, and estimating a noise model spectrum as the spectrum of a noise model based on the fifth spectrum of the noise frame; selecting a frequency as a selected frequency based on a comparison between a level of the fifth spectrum of the voice frame and a level of the noise model spectrum; and calculating a distortion amount of the voice frame based on the first spectrum of the voice frame and the second spectrum of the voice frame at the selected frequency, wherein the selecting, adds a noise model power spectrum including a specific margin, sets the addition of the noise model power spectrum as a threshold power spectrum, and selects a frequency in which a level of an original voice power spectrum is not less than the level of the threshold power spectrum.
地址 Kawasaki JP