发明名称 |
Keyword detection for speech recognition |
摘要 |
This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion. |
申请公布号 |
US9230541(B2) |
申请公布日期 |
2016.01.05 |
申请号 |
US201414567969 |
申请日期 |
2014.12.11 |
申请人 |
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED |
发明人 |
Ll Lu;Lu Li;Ma Jianxiong;Kong Linghui;Rao Feng;Yue Shuai;Zhang Xiang;Liu Haibo;Wang Eryu;Chen Bo |
分类号 |
G10L15/08 |
主分类号 |
G10L15/08 |
代理机构 |
Morgan, Lewis & Bockius LLP |
代理人 |
Morgan, Lewis & Bockius LLP |
主权项 |
1. A method of recognizing a keyword in a speech, comprising:
on an electronic device:
receiving a sequence of audio frames comprising a current frame and a subsequent frame that follows the current frame;determining a candidate keyword for the current frame using a predetermined decoding network that comprises keywords and filler words of multiple languages,associating the audio frame sequence with a confidence score that is partially determined according to the candidate keyword;identifying a word option for the subsequent frame using the candidate keyword and the predetermined decoding network;when the candidate keyword and the word option are associated with two distinct types of languages, updating the confidence score of the audio frame sequence based on a penalty factor that is predetermined according to the two distinct types of languages, the word option and an acoustic model of the subsequent frame; anddetermining that the audio frame sequence includes both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion. |
地址 |
Shenzhen, Guangdong Province CN |