Towards Weaker Variance Assumptions for Stochastic Optimization

Ahmet Alacaoglu; Stephen J. Wright; Yura Malitsky

arxiv: 2504.09951 · v2 · pith:S4D5UJMVnew · submitted 2025-04-14 · 🧮 math.OC · cs.LG· stat.ML

Towards Weaker Variance Assumptions for Stochastic Optimization

Ahmet Alacaoglu , Yura Malitsky , Stephen J. Wright This is my paper

classification 🧮 math.OC cs.LGstat.ML

keywords optimizationproblemsstochasticalgorithmsassumptionconvexvarianceanalyzing

0 comments

read the original abstract

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Dual Averaging Power-Prox Method with Application to Heavy-Tail Incremental Gradient
math.OC 2026-06 unverdicted novelty 7.0

Dual Averaging Power-Prox method provides the first convergence analysis for incremental gradients with heavy-tailed noise and shows asymptotically better rates than i.i.d. SGD.
Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation
cs.LG 2026-05 unverdicted novelty 7.0

A unified recursion framework for stochastic variance-reduced estimation yields high-probability bounds and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints.
Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise
cs.LG 2026-05 unverdicted novelty 7.0

Normalized momentum SGD and variance-reduced STORM achieve O(ε^{-6}) and O(ε^{-4}) oracle complexities respectively under quadratic distance-dependent noise in nonconvex stochastic optimization.
SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
cs.LG 2026-05 unverdicted novelty 6.0

Proves ELBO solution existence for elliptic location-scale families and convergence guarantees for preconditioned dynamic-batched projected SGD under Blum-Gladyshev conditions in BBVI.