发明名称 Audiovisual information processing in videoconferencing
摘要 Embodiments of the present invention relate to audiovisual stream processing in videoconferences. For each audiovisual stream in a videoconference, a sound level of the audiovisual stream is detected. If the sound level exceeds a predefined threshold level, the audiovisual stream is processed with a first configuration. If the sound level is below the predefined threshold level, the audiovisual stream is processed with a second configuration. The second configuration is more resource-effective than the first configuration.
申请公布号 US9560319(B1) 申请公布日期 2017.01.31
申请号 US201615098449 申请日期 2016.04.14
申请人 International Business Machines Corporation 发明人 Pan Yang;Su Wei;Zhang Yi;Zhang Yi Jian
分类号 H04N7/15 主分类号 H04N7/15
代理机构 代理人 Harmon, Jr. Gilbert
主权项 1. A method for processing a plurality of audiovisual streams in a videoconference, the method comprising: detecting a sound level of an audiovisual stream in a videoconference based on determining an average sound level of the audiovisual stream over a predefined time period, decomposing the audiovisual stream into an audio component and a video component and analyzing the audio component to determine the sound level wherein analyzing the audio component is based on at least one of sound intensity, sound pressure, sound power, sound energy density and sound loudness; in response to the sound level exceeding a first predefined threshold level, processing the audiovisual stream with a first configuration based on a first quality level wherein exceeding the first predefined threshold level comprises determining that the sound level of the audiovisual stream does not fall below the first predefined threshold level for a sequential time period greater than a predefined threshold time period; in response to the sound level being below the first predefined threshold level and above a second predefined sound level, processing the audiovisual stream with a second configuration based on a second quality level, wherein the second configuration is more resource-effective than the first configuration and the second quality level is lower than the first quality level wherein the first quality level and the second quality level are based on signal-to-noise ratio and at least one of frequency response, stereo crosstalk, or output power; in response to the sound level being below the second predefined sound level, discarding the audiovisual stream; superimposing the audio component of the audiovisual stream with audio components of further audiovisual streams associated with the videoconference wherein the further audiovisual streams are processed with the first configuration; combining the video component of the audiovisual stream with video components of the further audiovisual streams; and rendering the audiovisual stream in a display area, wherein an appearance of the display area is determined based on the sound level of the audiovisual stream.
地址 Armonk NY US