High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
Don’t use large mini-b atches, use local SGD
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 2representative citing papers
Local MixVR achieves communication complexity scaling only with number of workers M, independent of total samples N, and outperforms Minibatch Accelerated SGD when M is smaller than order N to the 1/4.
Decentralized SGD and SGDA under Markovian sampling admit non-asymptotic generalization bounds that incorporate network topology, Markov mixing rates, and primal-dual dynamics.
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
Analog over-the-air DSGD scheme in which a multi-antenna PS compensates for blind transmitters so that fading and noise vanish as antenna count grows.
FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.
citing papers explorer
-
Unveiling High-Probability Generalization in Decentralized SGD
High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
-
Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning
Local MixVR achieves communication complexity scaling only with number of workers M, independent of total samples N, and outperforms Minibatch Accelerated SGD when M is smaller than order N to the 1/4.
-
Stability and Generalization for Decentralized Markov SGD
Decentralized SGD and SGDA under Markovian sampling admit non-asymptotic generalization bounds that incorporate network topology, Markov mixing rates, and primal-dual dynamics.
-
Adaptive Federated Optimization
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
-
Collaborative Machine Learning at the Wireless Edge with Blind Transmitters
Analog over-the-air DSGD scheme in which a multi-antenna PS compensates for blind transmitters so that fading and noise vanish as antenna count grows.
-
Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization
FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.