发明名称 Online maximum-likelihood mean and variance normalization for speech recognition
摘要 A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
申请公布号 US9280979(B2) 申请公布日期 2016.03.08
申请号 US201514640912 申请日期 2015.03.06
申请人 Nuance Communications, Inc. 发明人 Willett Daniel
分类号 G10L19/02;G10L15/02;G10L15/08;G10L15/20;G10L19/00;G10L15/34 主分类号 G10L19/02
代理机构 Banner & Witcoff, Ltd. 代理人 Banner & Witcoff, Ltd.
主权项 1. A computer-implemented method for speech recognition comprising: receiving, from an audio input device, an electric signal corresponding to speech received at the audio input device; producing a sequence of speech vectors by processing at least a portion of the electric signal that corresponds to a speech utterance of the speech; performing a single time-synchronous speech recognition pass to determine a recognition output for the speech utterance; wherein the speech recognition pass comprises, for each speech vector in the sequence of speech vectors, estimating a feature transform,adjusting the speech vector based on the feature transform to obtain an adjusted speech vector, andusing the adjusted speech vector in a current frame of a decoding search; wherein, before a first threshold number of speech vectors, the feature transform is a first feature transform type that is estimated based on a conventional normalization feature transform; wherein, after a second threshold number of speech vectors, the feature transform is a second feature transform type that is estimated based on one or more preceding speech vectors of the sequence of speech vectors and partial decoding results of a decoding search; and wherein, between the first threshold number of speech vectors and the second threshold number of speech vectors, the feature transform is interpolated between the first feature transform type and the second feature transform type.
地址 Burlington MA US