摘要 |
By recognizing visual trigger events to determine start points and/or end points of voice data signals, the negative effects of noise on voice recognition may be significantly minimized. The visual trigger events may be predetermined gestures and/or predetermined postures of a user captured by a camera, which allow a system to appropriately focus attention on a user to optimize the receipt of a voice command in a noisy environment. This may be accomplished through the assistance of visual feedback complementing the voice feedback provided to the system by the user. Since the visual trigger events are predetermined gestures and/or postures, the system may be able to distinguish which sounds produced by a user are voice commands and which sounds produced by the user is noise that in unrelated to the operation of the system. |