发明名称 Method and apparatus for predicting events in video conferencing and other applications
摘要 Methods and apparatus are disclosed for predicting events using acoustic and visual cues. The present invention processes audio and video information to identify one or more (i) acoustic cues, such as intonation patterns, pitch and loudness, (ii) visual cues, such as gaze, facial pose, body postures, hand gestures and facial expressions, or (iii) a combination of the foregoing, that are typically associated with an event, such as behavior exhibited by a video conference participant before he or she speaks. In this manner, the present invention allows the video processing system to predict events, such as the identity of the next speaker. The predictive speaker identifier operates in a learning mode to learn the characteristic profile of each participant in terms of the concept that the participant "will speak" or "will not speak" under the presence or absence of one or more predefined visual or acoustic cues. The predictive speaker identifier operates in a predictive mode to compare the learned characteristics embodied in the characteristic profile to the audio and video information and thereby predict the next speaker.
申请公布号 US6894714(B2) 申请公布日期 2005.05.17
申请号 US20000730204 申请日期 2000.12.05
申请人 KONINKLIJKE PHILIPS ELECTRONICS N.V. 发明人 GUTTA SRINIVAS;STRUBBE HUGO;COLMENAREZ ANTONIO
分类号 H04N7/15;(IPC1-7):H04N7/14 主分类号 H04N7/15
代理机构 代理人
主权项
地址