发明名称 Downsampling schemes in a hierarchical neural network structure for phoneme recognition
摘要 An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
申请公布号 US9595257(B2) 申请公布日期 2017.03.14
申请号 US200913497119 申请日期 2009.09.28
申请人 Nuance Communications, Inc. 发明人 Cano Daniel Andrés Vásquez;Aradilla Guillermo;Gruhn Rainer
分类号 G10L15/00;G10L15/04;G10L15/14;G10L15/16;G10L21/00;G10L25/00;G10L15/02 主分类号 G10L15/00
代理机构 Sunstein Kann Murphy & Timbers LLP 代理人 Sunstein Kann Murphy & Timbers LLP
主权项 1. A computer based method implemented using at least one hardware implemented processor for phoneme recognition comprising: using the at least one hardware implemented processor to perform the steps of: generating a sequence of intermediate phoneme posterior vectors from an input sequence of cepstral features using a first layer perceptron that provides feature-level context-modeling; using an intermediate phoneme time boundary decoder to determine a sequence of sampling segments as a function of the intermediate phoneme posterior vectors, each sampling segment defined by a phoneme start boundary and a phoneme stop boundary; downsampling the intermediate phoneme posterior vectors by sampling the sequence of intermediate phoneme posterior vectors a fixed number of times in each sampling segment to form a reduced input set of intermediate phoneme posterior vectors for a second layer perceptron; generating a sequence of final phoneme posterior vectors from the reduced input set of intermediate phoneme posterior vectors using the second layer perceptron based on posterior-level context modeling of inter-phonetic information; and decoding the final phoneme posterior vectors using a final phoneme recognition decoder to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
地址 Burlington MA US