pith. sign in

arxiv: 1906.04787 · v1 · pith:2B5PPEONnew · submitted 2019-06-11 · 💻 cs.LG · cond-mat.stat-mech· cs.PF· stat.ML

Power Gradient Descent

classification 💻 cs.LG cond-mat.stat-mechcs.PFstat.ML
keywords gradientpowerdescentgradientsmethodsacceleratedachieveadam
0
0 comments X
read the original abstract

The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep directions of the function to minimize. It is based on a "power gradient", in which each component of the gradient is replaced by its versus-preserving $H$-th power, with $0<H<1$. We test three modern gradient descent methods fed by such variant and by standard gradients, finding the new version to achieve significantly better performances for the Nesterov accelerated gradient and AMSGrad. We also propose an effective new take on the ADAM algorithm, which includes power gradients with varying $H$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.