摘要 |
<p>Methods and apparatus are disclosed for tracking an object of interest in a video processing system, using clustering techniques. An area is partitioned into approximate regions, referred to as clusters, each associated with an object of interest. Each cluster has associated average pan, tilt and zoom values. Audio or video information, or both, are used to identify the cluster associated with a speaker (or another object of interest). Once the cluster of interest is identified, the camera is focused on the cluster, using the recorded pan, tilt and zoom values, if available. An event accumulator initially accumulates audio (and optionally video) events for a specified time, to allow several speakers to speak. The accumulated audio events are then used by a cluster generator to generate clusters associated with the various objects of interest. After initialization of the clusters, the illustrative event accumulator gathers events at periodic intervals.</p> |