Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

LIENS); Mark Schmidt (INRIA Paris - Rocquencourt; Nicolas Le Roux (INRIA Paris - Rocquencourt

arxiv: 1308.6370 · v1 · pith:C4V6J7N6new · submitted 2013-08-29 · 🧮 math.OC

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

Mark Schmidt (INRIA Paris - Rocquencourt , LIENS) , Nicolas Le Roux (INRIA Paris - Rocquencourt This is my paper

classification 🧮 math.OC

keywords gradientconvergencefunctionunderaveragelinearnormrate

0 comments

read the original abstract

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
cs.LG 2026-05 unverdicted novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
math.OC 2026-05 unverdicted novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Convergence guarantees for stochastic algorithms solving non-unique problems in metric spaces
math.OC 2026-05 unverdicted novelty 7.0

A quantitative theorem supplies uniform rates of convergence for stochastic quasi-Fejér monotone sequences in metric spaces by extending a deterministic regularity notion to the stochastic setting and applying it to p...
Stochastic Trust-Region Methods for Over-parameterized Models
math.OC 2026-04 unverdicted novelty 7.0

Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...
Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD
math.OC 2026-04 unverdicted novelty 7.0

Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments
cs.LG 2026-05 unverdicted novelty 6.0

VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
math.OC 2026-05 unverdicted novelty 5.0

Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
Stochastic versus Deterministic in Stochastic Gradient Descent
math.OC 2025-09 unverdicted novelty 5.0

Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.