主权项 |
1. A computer-implemented method for speech recognition comprising:
receiving, from an audio input device, an electric signal corresponding to speech received at the audio input device; producing a sequence of speech vectors by processing at least a portion of the electric signal that corresponds to a speech utterance of the speech; performing a single time-synchronous speech recognition pass to determine a recognition output for the speech utterance; wherein the speech recognition pass comprises, for each speech vector in the sequence of speech vectors,
estimating a feature transform,adjusting the speech vector based on the feature transform to obtain an adjusted speech vector, andusing the adjusted speech vector in a current frame of a decoding search; wherein, before a first threshold number of speech vectors, the feature transform is a first feature transform type that is estimated based on a conventional normalization feature transform; wherein, after a second threshold number of speech vectors, the feature transform is a second feature transform type that is estimated based on one or more preceding speech vectors of the sequence of speech vectors and partial decoding results of a decoding search; and wherein, between the first threshold number of speech vectors and the second threshold number of speech vectors, the feature transform is interpolated between the first feature transform type and the second feature transform type. |