发明名称 Interactive device that recognizes input voice of a user and contents of an utterance of the user, and performs a response corresponding to the recognized contents
摘要 The present invention provides an interactive device which allows quick utterance recognition results and sequential output thereof and which diminishes a recognition rate decrease even if user's utterance is divided by a short interval into frames for quick decision. The interactive device: sets a recognition section for voice recognition; performs voice recognition for the recognition section; when the voice recognition includes a key phrase, determines response actions corresponding thereto; and executes the response actions. The interactive device repeatedly updates the set recognition terminal point to a frame which is the predetermined time length ahead of the set recognition terminal point to set a plurality of recognition sections. The interactive device performs voice recognition for each recognition section.
申请公布号 US9002705(B2) 申请公布日期 2015.04.07
申请号 US201213450515 申请日期 2012.04.19
申请人 Honda Motor Co., Ltd. 发明人 Yoshida Yuichi;Osada Taku
分类号 G10L17/00;G10L15/22;G10L15/26;G06F17/27 主分类号 G10L17/00
代理机构 Rankin, Hill & Clark LLP 代理人 Rankin, Hill & Clark LLP
主权项 1. An interactive device of that recognizes input voice of a user and thereby contents of utterance of the user and performs a predetermined response action corresponding to the recognized contents, the interactive device comprising: a recognition section setting means that sets a recognition starting point to an utterance starting end frame serving as a starting end of the user's utterance in the input voice and sets a recognition terminal point to a frame which is a predetermined length of time ahead of the recognition starting point to thereby set a recognition section throughout which voice recognition is performed, a voice recognition means that performs voice recognition for the recognition section, a response action determining means that, if a recognition result by the voice recognition means includes a key phrase, determines a response action associated with the key phrase, and a response action executing means that executes the response action determined by the response action determining means, the recognition section setting means repeatedly updating the frame set as the recognition terminal point to a frame which is the predetermined length of time ahead of the recognition terminal point, to thereby set a plurality of recognition sections having different recognition terminal points, and the voice recognition means performing voice recognition on each of the plurality of recognition sections having different recognition terminal points, wherein the recognition section setting means comprises: a recognition starting point setting unit that detects the utterance starting end frame and sets the recognition starting point at the detected utterance starting end frame, a recognition terminal point setting unit that sets the recognition terminal point at a frame which is the predetermined length of time ahead of the recognition starting point set by the recognition starting point setting unit; and a recognition terminal point updating unit that updates repeatedly the recognition terminal point set by the recognition terminal point setting unit to a frame which is the predetermined length of time ahead of the recognition terminal point, the recognition terminal point updating unit detects an utterance terminal end frame serving as a terminal end of the user's utterance in the input voice and updates the recognition terminal point to the detected utterance terminal end frame, said recognition terminal point being either one of the recognition terminal point set by the recognition terminal point setting unit and the recognition terminal point updated by the recognition terminal point updating unit, the voice recognition means comprises: a first-path searching unit that searches word candidates in the user's utterance in a direction from the utterance starting end frame to the utterance terminal end frame, and a second-path search unit that searches the word candidates in each of the plurality of recognition sections having different recognition terminal points in a direction from the recognition terminal point to the recognition starting point according to a search result produced by the first-path searching unit, and the response action determining means determines, when a search result produced by the second-path search unit includes the key phase, the response action corresponding to the key phrase.
地址 Tokyo JP