Unified analysis shows decentralized ProxSkip achieves linear speedup in number of nodes under stochastic gradients for non-convex problems.
TAMUNA: Doubly accelerated federated learning with local training, compression, and partial participation,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
citing papers explorer
-
Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization
Unified analysis shows decentralized ProxSkip achieves linear speedup in number of nodes under stochastic gradients for non-convex problems.
-
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.