Towards Weaker Variance Assumptions for Stochastic Optimization
read the original abstract
We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
The Dual Averaging Power-Prox Method with Application to Heavy-Tail Incremental Gradient
Dual Averaging Power-Prox method provides the first convergence analysis for incremental gradients with heavy-tailed noise and shows asymptotically better rates than i.i.d. SGD.
-
Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation
A unified recursion framework for stochastic variance-reduced estimation yields high-probability bounds and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints.
-
Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise
Normalized momentum SGD and variance-reduced STORM achieve O(ε^{-6}) and O(ε^{-4}) oracle complexities respectively under quadratic distance-dependent noise in nonconvex stochastic optimization.
-
SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
Proves ELBO solution existence for elliptic location-scale families and convergence guarantees for preconditioned dynamic-batched projected SGD under Blum-Gladyshev conditions in BBVI.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.