Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

Hybrid stochastic gradient descent algorithms for stochastic nonconvex optimization , author= · 2019 · math.OC · arXiv 1905.05920

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to handle a broader class of estimators in both convex and nonconvex settings. We propose a new single-loop stochastic gradient descent algorithm that can achieve $O(\max\{\sigma^3\varepsilon^{-1},\sigma\varepsilon^{-3}\})$-complexity bound to obtain an $\varepsilon$-stationary point under smoothness and $\sigma^2$-bounded variance assumptions. This complexity is better than $O(\sigma^2\varepsilon^{-4})$ often obtained in state-of-the-art SGDs when $\sigma < O(\varepsilon^{-3})$. We also consider different extensions of our method, including constant and adaptive step-size with single-loop, double-loop, and mini-batch variants. We compare our algorithms with existing methods on several datasets using two nonconvex models.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

A unified recursion framework for stochastic variance-reduced estimation yields high-probability bounds and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

math.OC · 2026-05-09 · unverdicted · novelty 5.0

Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.

citing papers explorer

Showing 5 of 5 citing papers.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging cs.LG · 2026-05-20 · unverdicted · none · ref 121 · internal anchor
LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method cs.LG · 2026-05-18 · unverdicted · none · ref 119 · internal anchor
Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation cs.LG · 2026-05-14 · unverdicted · none · ref 133 · internal anchor
A unified recursion framework for stochastic variance-reduced estimation yields high-probability bounds and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints.
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits math.OC · 2026-05-08 · unverdicted · none · ref 41
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction math.OC · 2026-05-09 · unverdicted · none · ref 117
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer