发明名称 STATISTICAL ACOUSTIC MODEL ADAPTATION METHOD, ACOUSTIC MODEL LEARNING METHOD SUITABLE FOR STATISTICAL ACOUSTIC MODEL ADAPTATION, STORAGE MEDIUM STORING PARAMETERS FOR BUILDING DEEP NEURAL NETWORK, AND COMPUTER PROGRAM FOR ADAPTING STATISTICAL ACOUSTIC MODEL
摘要 [Object] An object is to provide a statistical acoustic model adaptation method capable of efficient adaptation of an acoustic model using DNN with training data under a specific condition and achieving higher accuracy. [Solution] A method of speaker adaptation of an acoustic model using DNN includes the steps of: storing speech data 90 to 98 of different speakers separately in a first storage device; preparing speaker-by-speaker hidden layer modules 112 to 120; performing preliminary learning of all layers 42, 44, 110, 48, 50, 52 and 54 of a DNN 80 by switching and selecting the speech data 90 to 98 while dynamically replacing a specific layer 110 with hidden layer modules 112 to 120 corresponding to the selected speech data; replacing the specific layer 110 of the DNN that has completed the preliminary learning with an initial hidden layer; and training the DNN with speech data of a specific speaker while fixing parameters of layers other than the initial hidden layer.
申请公布号 US2016260428(A1) 申请公布日期 2016.09.08
申请号 US201415031449 申请日期 2014.11.06
申请人 National Institute of Information and Communications Technology 发明人 MATSUDA Shigeki;LU Xugang
分类号 G10L15/16;G06N3/08;G10L15/07;G06N3/04 主分类号 G10L15/16
代理机构 代理人
主权项 1. A statistical acoustic model adaptation method for speech recognition under a specific condition, wherein said acoustic model is an acoustic model using Deep Neural Network or DNN, said DNN including a plurality of layers of three or more; said method comprising the steps of: a computer readable first storage device separately storing speech data under a plurality of conditions; a computer preparing a plurality of hidden layer modules for respective conditions corresponding to said plurality of conditions; the computer, while switching and selecting speech data under said plurality of conditions, performing preliminary learning of all layers of said DNN while dynamically replacing a specific one of said plurality of layers with a hidden layer module corresponding to the selected speech data; the computer replacing said specific layer of said DNN that has completed the learning at said step of performing preliminary learning with an initial hidden layer prepared in advance; a second computer readable storage device storing speech data under a condition as an object of adaptation; and reading the speech data under the condition of said object of adaptation from said second storage device while fixing parameters of layers other than said initial hidden layer of said DNN produced at said step of replacing, and training said DNN.
地址 Tokyo JP