摘要 |
The present invention generally relates to the field of noise reduction systems which are equipped with an audio-visual user interface, in particular to an audio-visual speech activity recognition system (200b/c) of a video-enabled telecommunication device which runs a real-time lip tracking application that can advantageously be used for a near-speaker detection algorithm in an environment where a speaker's voice is interfered by a statistically distributed background noise (n'(t)) including both environmental noise (n(t)) and surrounding persons' voices ( SIGMA j aj.sj(t-Tj) with j NOTEQUAL i). Said real-time lip tracking application combines a visual feature vector (o nu ,nT) that comprises features extracted from a digital video sequence ( nu (nT)) showing the speaker's face by detecting and analyzing lip movements and facial expressions of said speaker (Si) with an audio feature vector (oa,nT) which comprises features extracted from a recorded analog audio sequence (s(t)) representing the voice of said speaker (Si) interfered by said background noise (n'(t)). <IMAGE> <IMAGE> |