pith. sign in

arxiv: 1308.6370 · v1 · pith:C4V6J7N6new · submitted 2013-08-29 · 🧮 math.OC

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

classification 🧮 math.OC
keywords gradientconvergencefunctionunderaveragelinearnormrate
0
0 comments X
read the original abstract

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

    cs.LG 2026-05 unverdicted novelty 7.0

    Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

  2. Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

    math.OC 2026-05 unverdicted novelty 7.0

    Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

  3. Convergence guarantees for stochastic algorithms solving non-unique problems in metric spaces

    math.OC 2026-05 unverdicted novelty 7.0

    A quantitative theorem supplies uniform rates of convergence for stochastic quasi-Fejér monotone sequences in metric spaces by extending a deterministic regularity notion to the stochastic setting and applying it to p...

  4. Stochastic Trust-Region Methods for Over-parameterized Models

    math.OC 2026-04 unverdicted novelty 7.0

    Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...

  5. Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD

    math.OC 2026-04 unverdicted novelty 7.0

    Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.

  6. \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments

    cs.LG 2026-05 unverdicted novelty 6.0

    VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.

  7. Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

    math.OC 2026-05 unverdicted novelty 5.0

    Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.

  8. Stochastic versus Deterministic in Stochastic Gradient Descent

    math.OC 2025-09 unverdicted novelty 5.0

    Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.