摘要 |
<p>The present invention relates generally to the field of video-camera systems, such as a video conferencing systems, and more particularly to video camera targeting systems (100) that locate and acquire targets using an input characterizing a target (5) and a machine-classification system to assist in target acquisition responsively to that input. In some embodiments, the characterization and classification are employed together with one or more inputs of other modalities such as gesture-control. In one example of the system in operation, an operator (4) is able to make pointing gestures toward an object (5) and, simultaneously speak a sentence identifying the object to which the speaker is pointing. At least one term of the sentence, presumably, is associated with a machine-sensible characteristic by which the object (5) can be identified. The system captures and processes the voice and gesture inputs and re-positions a pan-tilt-zoom PTZ video camera (2) to focus on the object that best matches both the characteristics and the gesture. Thus, the PTZ camera (2) is aimed based upon the inputs the system receives and the system's ability to locate the target (5) by its sensors.</p> |