ADAPTIVE LEARNING RATE SCHEDULE IN DISTRIBUTED STOCHASTIC GRADIENT DESCENT
摘要:
A method for performing machine learning includes assigning processing jobs to a plurality of model learners, using a central parameter server. The processing jobs includes solving gradients based on a current set of parameters. As the results from the processing job are returned, the set of parameters is iterated. A degree of staleness of the solving of the second gradient is determined based on a difference between the set of parameters when the jobs are assigned and the set of parameters when the jobs are returned. The learning rates used to iterate the parameters based on the solved gradients are proportional to the determined degrees of staleness.
信息查询
0/0