EFFICIENT CALCULATIONS OF NEGATIVE CURVATURE IN A HESSIAN FREE DEEP LEARNING FRAMEWORK
Abstract:
A method for training a deep learning network includes defining a loss function corresponding to the network. Training samples are received and current parameter values are set to initial parameter values. Then, a computing platform is used to perform an optimization method which iteratively minimizes the loss function. Each iteration comprises the following steps. An eigCG solver is applied to determine a descent direction by minimizing a local approximated quadratic model of the loss function with respect to current parameter values and the training dataset. An approximate leftmost eigenvector and eigenvalue is determined while solving the Newton system. The approximate leftmost eigenvector is used as negative curvature direction to prevent the optimization method from converging to saddle points. Curvilinear and adaptive line-searches are used to guide the optimization method to a local minimum. At the end of the iteration, the current parameter values are updated based on the descent direction.
Information query
Patent Agency Ranking
0/0