主权项 |
1. An interactive device of that recognizes input voice of a user and thereby contents of utterance of the user and performs a predetermined response action corresponding to the recognized contents, the interactive device comprising:
a recognition section setting means that sets a recognition starting point to an utterance starting end frame serving as a starting end of the user's utterance in the input voice and sets a recognition terminal point to a frame which is a predetermined length of time ahead of the recognition starting point to thereby set a recognition section throughout which voice recognition is performed, a voice recognition means that performs voice recognition for the recognition section, a response action determining means that, if a recognition result by the voice recognition means includes a key phrase, determines a response action associated with the key phrase, and a response action executing means that executes the response action determined by the response action determining means, the recognition section setting means repeatedly updating the frame set as the recognition terminal point to a frame which is the predetermined length of time ahead of the recognition terminal point, to thereby set a plurality of recognition sections having different recognition terminal points, and the voice recognition means performing voice recognition on each of the plurality of recognition sections having different recognition terminal points, wherein the recognition section setting means comprises: a recognition starting point setting unit that detects the utterance starting end frame and sets the recognition starting point at the detected utterance starting end frame, a recognition terminal point setting unit that sets the recognition terminal point at a frame which is the predetermined length of time ahead of the recognition starting point set by the recognition starting point setting unit; and a recognition terminal point updating unit that updates repeatedly the recognition terminal point set by the recognition terminal point setting unit to a frame which is the predetermined length of time ahead of the recognition terminal point, the recognition terminal point updating unit detects an utterance terminal end frame serving as a terminal end of the user's utterance in the input voice and updates the recognition terminal point to the detected utterance terminal end frame, said recognition terminal point being either one of the recognition terminal point set by the recognition terminal point setting unit and the recognition terminal point updated by the recognition terminal point updating unit, the voice recognition means comprises: a first-path searching unit that searches word candidates in the user's utterance in a direction from the utterance starting end frame to the utterance terminal end frame, and a second-path search unit that searches the word candidates in each of the plurality of recognition sections having different recognition terminal points in a direction from the recognition terminal point to the recognition starting point according to a search result produced by the first-path searching unit, and the response action determining means determines, when a search result produced by the second-path search unit includes the key phase, the response action corresponding to the key phrase. |