摘要 |
Methods of training neural networks ( 100, 600 ) that include one or more inputs (102-108) and a sequence of processing nodes ( 110, 112, 114, 116 ) in which each processing node may be coupled to one or more processing nodes that are closer to an output node are provided. The methods include establishing an objective function that preferably includes a term related to differences between actual and expected output for training data, and a term related to the number of weights of significant magnitude. Training involves optimizing the objective function in terms of weights that characterize directed edges of the neural network. The objective function is optimized using algorithms that employ derivatives of the objective function. Algorithms for accurately and efficiently estimating derivatives of the summed input going into output processing nodes of the neural network with respect to the weights of the neural network are provided.
|