发明名称 SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION
摘要 Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound- reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.
申请公布号 WO2013058728(A1) 申请公布日期 2013.04.25
申请号 WO2011US56552 申请日期 2011.10.17
申请人 NUANCE COMMUNICATIONS, INC.;HERBIG, TOBIAS;WOLFF, TOBIAS;BUCK, MARKUS 发明人 HERBIG, TOBIAS;WOLFF, TOBIAS;BUCK, MARKUS
分类号 G10L21/02;H04M3/56;H04N1/40;H04N7/15;H04R3/00;H04R3/04 主分类号 G10L21/02
代理机构 代理人
主权项
地址