发明名称 Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks
摘要 A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.
申请公布号 US9626621(B2) 申请公布日期 2017.04.18
申请号 US201514793095 申请日期 2015.07.07
申请人 International Business Machines Corporation 发明人 Dognin Pierre;Goel Vaibhava
分类号 G10L15/16;G06N3/08;G10L15/06;G06N3/04 主分类号 G10L15/16
代理机构 Ryan, Mason & Lewis, LLP 代理人 Stock William;Ryan, Mason & Lewis, LLP
主权项 1. A system for training a deep neural network, comprising: a memory and at least one processor coupled to the memory; an input component, executed via the at least one processor, which receives and formats speech data for the training, and divides the speech data into a plurality of subsets; a training component, executed via the at least one processor, which performs Hessian-free sequence training on a first subset of the plurality of subsets of the speech data received from the input component, and iteratively performs the Hessian-free sequence training on successive subsets of the plurality of subsets of the speech data; wherein, when iteratively performing the Hessian-free sequence training, the training component: processes the first subset of the speech data to generate a first gradient of loss in a first iteration; andprocesses a successive subset of the speech data to generate a second gradient of loss in a second iteration; a weighting component operatively coupled to the training component and executed via the at least one processor, which dynamically computes weights for the first gradient of loss and for the second gradient of loss; wherein the training component reuses gradient information from at least one previous iteration, and wherein reusing the gradient information from the at least one previous iteration comprises integrating a weighted first gradient of loss and a weighted second gradient of loss to generate a solution to the second iteration; and an output component, executed via the at least one processor, which transmits a result of the iterative performance of the Hessian-free sequence training to the deep neural network.
地址 Armonk NY US