发明名称 Training a model using parameter server shards
摘要 Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a model using parameter server shards. One of the methods includes receiving, at a parameter server shard configured to maintain values of a disjoint partition of the parameters of the model, a succession of respective requests for parameter values from each of a plurality of replicas of the model; in response to each request, downloading a current value of each requested parameter to the replica from which the request was received; receiving a succession of uploads, each upload including respective delta values for each of the parameters in the partition maintained by the shard; and updating values of the parameters in the partition maintained by the parameter server shard repeatedly based on the uploads of delta values to generate current parameter values.
申请公布号 US9218573(B1) 申请公布日期 2015.12.22
申请号 US201313826327 申请日期 2013.03.14
申请人 Google Inc. 发明人 Corrado Gregory S.;Chen Kai;Dean Jeffrey A.;Bengio Samy;Monga Rajat;Devin Matthieu
分类号 G06F15/18;G06N99/00;G06N7/00;G06N5/02 主分类号 G06F15/18
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A system for training a model having parameters by determining a respective parameter value for each of the parameters of the model, the system comprising: a plurality of identical model replicas, wherein each of the plurality of replicas is an identical instance of the model with possibly different parameter values for the parameters of the model, wherein each model replica executes on a respective computing unit, wherein each model replica is configured to operate independently of each other model replica, and wherein each model replica is further configured to perform repeatedly the following operations: receiving, from at least one of a plurality of parameter server shards, current values of one or more of the parameters of the model, wherein each parameter server shard is configured to maintain values of a respective disjoint partition of the parameters of the model;computing respective delta values for each of a plurality of the parameters of the model by performing one or more iterations of a training process; andproviding, for each of the plurality of parameters, the delta value for the parameter to the parameter server shard that is configured to maintain the respective partition that includes the parameter.
地址 Mountain View CA US