发明名称 Depth based context identification
摘要 A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.
申请公布号 US9092394(B2) 申请公布日期 2015.07.28
申请号 US201213524351 申请日期 2012.06.15
申请人 Honda Motor Co., Ltd. 发明人 Dokor Tarek El;Holmes James;Cluster Jordan;Yamamoto Stuart;Vaghefinazari Pedram
分类号 G10L21/00;G10L15/00;G06F17/20;G06F3/01;G10L15/24;G10L15/25;G09G5/08 主分类号 G10L21/00
代理机构 Fenwick & West LLP 代理人 Fenwick & West LLP
主权项 1. A computer-implemented method of recognizing verbal commands, comprising: capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user; recognizing a pose or gesture of the user based on the captured depth image; generating gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle; determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user; selecting a plurality of verbal commands associated with the one or more devices determined as likely being targeted; receiving the audio signal including the utterance by the user at a time when the at least one depth image is being captured; and determining a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.
地址 Tokyo JP