发明名称 METHOD AND SYSTEM FOR SPEECH COMMAND DETECTION, AND INFORMATION PROCESSING SYSTEM
摘要 A method for speech command detection comprises extracting speech features from a speech signal inputted into a system, converting the speech features into a word sequence, obtaining time durations of speech segments corresponding to the respective non-command words and an acoustic score of each of the command word candidates, calculating rhythm features of the speech signal based on the time durations, and recognizing a speech corresponding to the at least one command word candidates as a speech command directed to the system or a speech not directed to the system based on the acoustic score and the rhythm features. The word sequence comprises at least two successive non-command words and at least one command word candidate. The rhythm features describe a similarity of time durations of speech segments corresponding to the respective non-command words, and/or a similarity of energy variations of the speech segments corresponding to the respective non-command words.
申请公布号 US2014337024(A1) 申请公布日期 2014.11.13
申请号 US201414274500 申请日期 2014.05.09
申请人 CANON KABUSHIKI KAISHA 发明人 Zuo Xiang;Hu Weixiang;Liu Hefei
分类号 G10L15/02 主分类号 G10L15/02
代理机构 代理人
主权项 1. A method for speech command detection comprising: feature extraction, for extracting speech features from a speech signal inputted into a system; speech recognition, for converting the speech features into a word sequence, wherein the word sequence comprises at least two successive non-command words and at least one command word candidates, and obtaining time durations of speech segments corresponding to the respective non-command words and an acoustic score of each of the command word candidates; rhythm analysis, for calculating rhythm features of the speech signal based on the time durations; and classification, for recognizing a speech corresponding to the at least one command word candidates as a speech command directed to the system or a speech not directed to the system based on the acoustic score and the rhythm features, wherein the rhythm features describe a similarity of time durations of speech segments corresponding to the respective non-command words, and/or a similarity of energy variations of the speech segments corresponding to the respective non-command words.
地址 Tokyo JP