A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.