Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

Chunming Wang; S\'ebastien M. R. Arnold

arxiv: 1709.05069 · v1 · pith:34GCKTOCnew · submitted 2017-09-15 · 💻 cs.LG

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

S\'ebastien M. R. Arnold , Chunming Wang This is my paper

classification 💻 cs.LG

keywords distributedapproximationgradientshessianmatrixmethodnovelable

0 comments

read the original abstract

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.

This paper has not been read by Pith yet.

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

discussion (0)