发明名称 Online Maximum-Likelihood Mean and Variance Normalization for Speech Recognition
摘要 A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
申请公布号 US2015221320(A1) 申请公布日期 2015.08.06
申请号 US201514640912 申请日期 2015.03.06
申请人 Nuance Communications, Inc. 发明人 Willett Daniel
分类号 G10L19/02;G10L19/00 主分类号 G10L19/02
代理机构 代理人
主权项 1. A method comprising: in one or more computer processes functioning in at least one computer processor: processing an input speech utterance to produce a sequence of representative speech vectors; andperforming a single time-synchronous speech recognition pass using a decoding search to determine a recognition output corresponding to the speech input, the decoding search including: i. for each speech vector before some first threshold number of speech vectors, estimating a feature transform based on a conventional feature normalization transform,ii. for each speech vector after the first threshold number of speech vectors, estimating the feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search,iii. adjusting a current speech vector based on the feature transform, andiv. using the adjusted current speech vector in a current frame of the decoding search; wherein for each speech vector after the first threshold number of speech vectors and before a second threshold number of speech vectors, the feature transform is interpolated between the transform based on the conventional feature normalization and the transform based on the preceding speech vectors in the utterance and the partial decoding results of the decoding search.
地址 Burlington MA US