Aggregated Momentum: Stability Through Passive Damping

James Lucas; Richard Zemel; Roger Grosse; Shengyang Sun

arxiv: 1804.00325 · v3 · pith:ORPICM32new · submitted 2018-04-01 · 💻 cs.LG · cs.AI· math.OC· stat.ML

Aggregated Momentum: Stability Through Passive Damping

James Lucas , Shengyang Sun , Richard Zemel , Roger Grosse This is my paper

classification 💻 cs.LG cs.AImath.OCstat.ML

keywords momentumaggmobetavaluesaggregatedconvergencedampingoscillations

0 comments

read the original abstract

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions. Its performance depends crucially on a damping coefficient $\beta$. Large $\beta$ values can potentially deliver much larger speedups, but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different $\beta$ parameters. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive $\beta$ values such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and analyze rates of convergence for quadratic objectives. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
math.OC 2023-07 unverdicted novelty 7.0

Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of ...