Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
A geometric alternative to Nesterov's accelerated gradient descent
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov's accelerated gradient descent. The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method. We provide some numerical evidence that the new method can be superior to Nesterov's accelerated gradient descent.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
Halpern iteration equals Nesterov acceleration for root-finding; new variants for monotone inclusions use only monotonicity and Lipschitz continuity.
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
citing papers explorer
-
Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
-
From Halpern's Fixed-Point Iterations to Nesterov's Accelerated Interpretations for Root-Finding Problems
Halpern iteration equals Nesterov acceleration for root-finding; new variants for monotone inclusions use only monotonicity and Lipschitz continuity.
-
Adaptive Federated Optimization
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.