发明名称 SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING
摘要 In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.
申请公布号 US2015095026(A1) 申请公布日期 2015.04.02
申请号 US201314039383 申请日期 2013.09.27
申请人 Amazon Technologies, Inc. 发明人 Bisani Michael Maximilian Emanuel;Strom Nikko;Hoffmeister Bjorn;Thomas Ryan Paul
分类号 G10L15/00;G10L15/16 主分类号 G10L15/00
代理机构 代理人
主权项 1. A method for performing speech recognition, the method comprising: receiving a multiple-channel audio signal comprising a first channel and a second channel, wherein the first channel and second channel are created using a beamformer and a microphone array, the first channel representing audio from a first direction, and the second channel representing audio from a second direction; creating a first sequence of feature vectors for the first channel and a second sequence of feature vectors for the second channel; performing speech recognition using the first sequence of feature vectors and the second sequence of feature vectors, wherein performing speech recognition comprises: generating a first hypothesis using a speech recognition model and a first feature vector of the first sequence of feature vectors;generating a second hypothesis using the speech recognition model and a second feature vector of the second sequence of feature vectors, wherein the second hypothesis is subsequent to the first hypothesis in a speech recognition result network.
地址 Reno NV US