发明名称 |
METHOD AND SYSTEM FOR SPEECH COMMAND DETECTION, AND INFORMATION PROCESSING SYSTEM |
摘要 |
A method for speech command detection comprises extracting speech features from a speech signal inputted into a system, converting the speech features into a word sequence, obtaining time durations of speech segments corresponding to the respective non-command words and an acoustic score of each of the command word candidates, calculating rhythm features of the speech signal based on the time durations, and recognizing a speech corresponding to the at least one command word candidates as a speech command directed to the system or a speech not directed to the system based on the acoustic score and the rhythm features. The word sequence comprises at least two successive non-command words and at least one command word candidate. The rhythm features describe a similarity of time durations of speech segments corresponding to the respective non-command words, and/or a similarity of energy variations of the speech segments corresponding to the respective non-command words. |
申请公布号 |
US2014337024(A1) |
申请公布日期 |
2014.11.13 |
申请号 |
US201414274500 |
申请日期 |
2014.05.09 |
申请人 |
CANON KABUSHIKI KAISHA |
发明人 |
Zuo Xiang;Hu Weixiang;Liu Hefei |
分类号 |
G10L15/02 |
主分类号 |
G10L15/02 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for speech command detection comprising:
feature extraction, for extracting speech features from a speech signal inputted into a system; speech recognition, for converting the speech features into a word sequence, wherein the word sequence comprises at least two successive non-command words and at least one command word candidates, and obtaining time durations of speech segments corresponding to the respective non-command words and an acoustic score of each of the command word candidates; rhythm analysis, for calculating rhythm features of the speech signal based on the time durations; and classification, for recognizing a speech corresponding to the at least one command word candidates as a speech command directed to the system or a speech not directed to the system based on the acoustic score and the rhythm features, wherein the rhythm features describe a similarity of time durations of speech segments corresponding to the respective non-command words, and/or a similarity of energy variations of the speech segments corresponding to the respective non-command words. |
地址 |
Tokyo JP |