pith. sign in

arxiv: 1308.6370 · v1 · pith:C4V6J7N6new · submitted 2013-08-29 · 🧮 math.OC

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

classification 🧮 math.OC
keywords gradientconvergencefunctionunderaveragelinearnormrate
0
0 comments X
read the original abstract

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

    cs.LG 2026-05 unverdicted novelty 7.0

    LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

  2. Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

    cs.LG 2026-05 unverdicted novelty 7.0

    Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

  3. Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

    math.OC 2026-05 unverdicted novelty 7.0

    Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

  4. Convergence guarantees for stochastic algorithms solving non-unique problems in metric spaces

    math.OC 2026-05 unverdicted novelty 7.0

    A quantitative theorem supplies uniform rates of convergence for stochastic quasi-Fejér monotone sequences in metric spaces by extending a deterministic regularity notion to the stochastic setting and applying it to p...

  5. Stochastic Trust-Region Methods for Over-parameterized Models

    math.OC 2026-04 unverdicted novelty 7.0

    Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...

  6. Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD

    math.OC 2026-04 unverdicted novelty 7.0

    Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.

  7. Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

    stat.ML 2026-02 unverdicted novelty 7.0

    Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.

  8. Stochastic algorithms with geometric step decay converge linearly on sharp functions

    math.OC 2019-07 unverdicted novelty 7.0

    Geometric step decay yields local linear convergence for stochastic algorithms on sharp nonconvex problems and gives matching or new guarantees for phase retrieval and blind deconvolution under Gaussian and heavy-tail...

  9. \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments

    cs.LG 2026-05 unverdicted novelty 6.0

    VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.

  10. Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

    math.OC 2026-05 unverdicted novelty 5.0

    Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.

  11. Stochastic versus Deterministic in Stochastic Gradient Descent

    math.OC 2025-09 unverdicted novelty 5.0

    Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.