Don’t use large mini-b atches, use local SGD

Don't use large mini-batches, use local SGD , author= · 2018 · arXiv 1808.07217

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Demystifying Pipeline Parallelism: First Theory for PipeDream

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Introduces Randomized PipeDream abstraction yielding first nonconvex convergence bound for PipeDream and proves delay scales as S squared for S stages.

Unveiling High-Probability Generalization in Decentralized SGD

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.

Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning

cs.LG · 2026-05-31 · unverdicted · novelty 6.0

Local MixVR achieves communication complexity scaling only with number of workers M, independent of total samples N, and outperforms Minibatch Accelerated SGD when M is smaller than order N to the 1/4.

Stability and Generalization for Decentralized Markov SGD

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

Decentralized SGD and SGDA under Markovian sampling admit non-asymptotic generalization bounds that incorporate network topology, Markov mixing rates, and primal-dual dynamics.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

Collaborative Machine Learning at the Wireless Edge with Blind Transmitters

cs.IT · 2019-07-08 · unverdicted · novelty 6.0

Analog over-the-air DSGD scheme in which a multi-antenna PS compensates for blind transmitters so that fading and noise vanish as antenna count grows.

Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization

cs.LG · 2026-04-14 · unverdicted · novelty 4.0

FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.

citing papers explorer

Showing 7 of 7 citing papers.

Demystifying Pipeline Parallelism: First Theory for PipeDream cs.LG · 2026-06-02 · unverdicted · none · ref 8
Introduces Randomized PipeDream abstraction yielding first nonconvex convergence bound for PipeDream and proves delay scales as S squared for S stages.
Unveiling High-Probability Generalization in Decentralized SGD cs.LG · 2026-05-11 · unverdicted · none · ref 12
High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning cs.LG · 2026-05-31 · unverdicted · none · ref 9
Local MixVR achieves communication complexity scaling only with number of workers M, independent of total samples N, and outperforms Minibatch Accelerated SGD when M is smaller than order N to the 1/4.
Stability and Generalization for Decentralized Markov SGD cs.LG · 2026-05-03 · unverdicted · none · ref 33
Decentralized SGD and SGDA under Markovian sampling admit non-asymptotic generalization bounds that incorporate network topology, Markov mixing rates, and primal-dual dynamics.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 26
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
Collaborative Machine Learning at the Wireless Edge with Blind Transmitters cs.IT · 2019-07-08 · unverdicted · none · ref 13
Analog over-the-air DSGD scheme in which a multi-antenna PS compensates for blind transmitters so that fading and noise vanish as antenna count grows.
Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization cs.LG · 2026-04-14 · unverdicted · none · ref 8
FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.

Don’t use large mini-b atches, use local SGD

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer