发明名称 Adaptive online feature normalization for speech recognition
摘要 A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.
申请公布号 US9263030(B2) 申请公布日期 2016.02.16
申请号 US201313748411 申请日期 2013.01.23
申请人 Microsoft Technology Licensing, LLC 发明人 Wang Shizhen;Gong Yifan;Alleva Fileno
分类号 G10L15/00;G10L15/02;G10L15/07;G10L25/75 主分类号 G10L15/00
代理机构 代理人 Holmes Danielle Johnston;Spellman Steven;Minhas Micky
主权项 1. A computer-implemented method, performed by a processor, for reducing speaker variability in speech recognition, the computer-implemented method comprising: receiving a first portion of an utterance received from a speaker; based on the first potion of the utterance, placing the speaker into a group, wherein the group is one of a plurality of groups each defined by a range of warping factor values; based on the grouping of the speaker, estimating a warping factor; receiving additional portions of speech from the speaker; adaptively adjusting the estimated warping factor based on the additional portions of speech; and based on the adjusted estimated warping factor, adjust the speech recognition for speaker variability associated with the speaker.
地址 Redmond WA US