Local Updates in Distributed Optimization: Provable Acceleration and Topology Effects
Pith reviewed 2026-05-16 16:22 UTC · model grok-4.3
The pith
Local updates accelerate the DIGing algorithm for distributed optimization over smooth strongly convex functions, with two updates delivering the full gain.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Incorporating local updates into the classic DIGing algorithm accelerates distributed optimization. Under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and additional local updates provide no further gains. These speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements.
What carries the argument
The local-update variant of DIGing analyzed through the Performance Estimation Problems framework to produce tight convergence bounds.
Load-bearing premise
An appropriate step size exists such that local updates accelerate the method without forcing a reduction in step size that cancels the benefit.
What would settle it
A simulation on a strongly convex smooth objective over a connected graph showing that the convergence rate with two local updates is no faster than with one update when steps are optimally tuned for each case.
Figures
read the original abstract
Inspired by the success of performing multiple local optimization steps between communication rounds in federated learning, incorporating such local updates into distributed optimization has recently attracted growing interest. However, unlike federated learning, where local updates can accelerate training by reducing gradient estimation error under minibatch settings, it remains unclear whether similar benefits persist when exact gradients are available. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. We also show that these speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that local updates (K gradient steps before mixing) in the DIGing algorithm accelerate convergence for smooth strongly convex functions, as established by tight PEP bounds. It asserts that K=2 achieves the maximal acceleration, further local updates yield no additional benefit, and the gains depend on the spectral properties of the mixing matrix (sparser graphs give smaller improvements). Experiments on synthetic and real-world data are said to corroborate the theory.
Significance. If the PEP analysis holds with an appropriate step size that delivers net acceleration without offset, the result would be significant: the first rigorous proof that local updates accelerate exact-gradient distributed optimization over a broad function class, plus practical guidance that K=2 suffices and topology modulates the benefit.
major comments (2)
- [PEP analysis section] PEP analysis section (step-size selection for local-update DIGing): the central claim requires explicit verification that the PEP-optimal step size for K=2 produces a strictly smaller contraction factor than for K=1. The modified iteration changes the feasible region of the PEP SDP; if the optimizer returns a smaller step size for K>1, the net rate improvement must be shown to remain positive. Without the explicit step-size expressions and rate comparison (across spectral gaps), it is unclear whether acceleration is realized in the original algorithm.
- [Theorem on maximal improvement] Theorem/Proposition on K=2 optimality: the statement that two local updates suffice for maximal improvement and additional updates bring no further gains must be supported by the explicit dependence of the PEP bound on K, including confirmation that the contraction factor plateaus for K>2 under the chosen step size.
minor comments (1)
- [Experiments] The experiments section should report the specific spectral gaps of the mixing matrices used, to directly link observed speed-ups to the claimed topology dependence.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the major comments point by point below. The PEP analysis already establishes the claimed acceleration and K=2 optimality through the SDP bounds, but we agree that adding explicit step-size expressions and direct comparisons will strengthen the presentation.
read point-by-point responses
-
Referee: [PEP analysis section] PEP analysis section (step-size selection for local-update DIGing): the central claim requires explicit verification that the PEP-optimal step size for K=2 produces a strictly smaller contraction factor than for K=1. The modified iteration changes the feasible region of the PEP SDP; if the optimizer returns a smaller step size for K>1, the net rate improvement must be shown to remain positive. Without the explicit step-size expressions and rate comparison (across spectral gaps), it is unclear whether acceleration is realized in the original algorithm.
Authors: We agree that explicit verification is valuable. The PEP SDP is solved numerically for the optimal step size at each K and spectral gap; the resulting contraction factors (reported via the performance bounds in Section 4) are strictly smaller for K=2 than K=1 across the tested range of mixing-matrix eigenvalues. To address the concern directly, we will add the closed-form optimal step-size expressions obtained from the PEP dual and a side-by-side rate comparison table (for representative spectral gaps) in the revised Section 4. This will confirm that the net improvement remains positive after step-size adjustment. revision: yes
-
Referee: [Theorem on maximal improvement] Theorem/Proposition on K=2 optimality: the statement that two local updates suffice for maximal improvement and additional updates bring no further gains must be supported by the explicit dependence of the PEP bound on K, including confirmation that the contraction factor plateaus for K>2 under the chosen step size.
Authors: The PEP formulation encodes the number of local steps K directly in the iteration matrix and feasible set. Solving the SDP shows that the worst-case contraction factor decreases from K=1 to K=2 and then remains constant for all K>2 under the corresponding optimal step size; this plateau is a direct consequence of the additional local steps becoming redundant once the local gradient information has been fully exploited within the performance estimation. We will add an explicit proposition (with the mathematical dependence on K) and the corresponding numerical confirmation from the SDP solver in the revised manuscript. revision: yes
Circularity Check
PEP analysis of local-update DIGing invokes external methodology without reducing acceleration claim to fitted parameters or self-referential equations
full rationale
The paper's central derivation applies the established PEP framework (an external performance estimation technique) to the modified DIGing iteration with K local updates. No equation in the provided text defines the contraction factor or optimal step size in terms of the target acceleration quantity itself, nor does any load-bearing step reduce by construction to a self-citation or fitted input. The claim that K=2 suffices for maximal improvement is obtained by comparing PEP-derived rates across K values under standard smoothness/strong-convexity assumptions, which remains independent of the paper's own fitted quantities. Minor self-citation risk exists around prior DIGing work but is not load-bearing for the acceleration result. This yields a low circularity score consistent with an externally grounded bound.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Objective functions are smooth and strongly convex (broad class allowing PEP analysis)
Reference graph
Works this paper leans on
-
[1]
Distributed subgradient met hods for multi- agent optimization,
A. Nedi´ c and A. Ozdaglar, “Distributed subgradient met hods for multi- agent optimization,” IEEE Trans. Autom. Control , vol. 54, no. 1, pp. 48–61, 2009
work page 2009
-
[2]
A collaborati ve training algorithm for distributed learning,
J. B. Predd, S. R. Kulkarni, and H. V . Poor, “A collaborati ve training algorithm for distributed learning,” IEEE Trans. Inf. Theory , vol. 55, no. 4, pp. 1856–1871, 2009
work page 2009
-
[3]
Distributed coordination archi tecture for multi-robot formation control,
W. Ren and N. Sorensen, “Distributed coordination archi tecture for multi-robot formation control,” Robot. Auton. Syst. , vol. 56, no. 4, pp. 324–333, 2008
work page 2008
-
[4]
Constrained c onsensus and optimization in multi-agent networks,
A. Nedi´ c, A. Ozdaglar, and P . A. Parrilo, “Constrained c onsensus and optimization in multi-agent networks,” IEEE Trans. Autom. Control , vol. 55, no. 4, pp. 922–938, 2010
work page 2010
-
[5]
EXTRA: An exact first-or der algorithm for decentralized consensus optimization,
W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An exact first-or der algorithm for decentralized consensus optimization,” SIAM J. Optim. , vol. 25, no. 2, pp. 944–966, 2015
work page 2015
-
[6]
J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Augmented distribute d gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in IEEE Conf. Decis. Control , 2015, pp. 2055–2060
work page 2015
-
[7]
Y . Sun, G. Scutari, and A. Daneshmand, “Distributed opti mization based on gradient tracking revisited: Enhancing convergen ce rate via surrogation,” SIAM J. Optim. , vol. 32, no. 2, pp. 354–385, 2022. [Online]. Available: https://doi.org/10.1137/19M1259973
-
[8]
Local exact-diffusion for decentrali zed optimization and learning,
S. A. Alghunaim, “Local exact-diffusion for decentrali zed optimization and learning,” IEEE Trans. Autom. Control , vol. 69, no. 11, pp. 7371– 7386, 2024
work page 2024
-
[9]
Robust decentralized learning with local updates and gradient tra cking,
S. Ghiasvand, A. Reisizadeh, M. Alizadeh, and R. Pedarsa ni, “Robust decentralized learning with local updates and gradient tra cking,” IEEE Trans. Netw., 2025
work page 2025
-
[10]
K. Huang and S. Pu, “Distributed stochastic momentum tr acking with local updates: Achieving optimal communication and iterat ion complex- ities,” arXiv preprint arXiv:2510.24155 , 2025
-
[11]
J. Liu, Z. Wang, and Y . Wang, “Guaranteeing both consens us and optimality in decentralized nonconvex optimization with m ultiple local updates,” arXiv preprint arXiv:2511.05242 , 2025
-
[12]
On the performance of gradient tracking with local updates,
E. D. Hien Nguyen, S. A. Alghunaim, K. Y uan, and C. A. Urib e, “On the performance of gradient tracking with local updates,” i n IEEE Conf. Decis. Control, 2023, pp. 4309–4313
work page 2023
-
[13]
The effectiveness of local upda tes for decentralized learning under data heterogeneity,
T. Wu, Z. Li, and Y . Sun, “The effectiveness of local upda tes for decentralized learning under data heterogeneity,” IEEE Trans. Signal Process., 2025
work page 2025
-
[14]
Proxskip: Y es! local gradient steps provably lead to communication ac celeration! finally!
K. Mishchenko, G. Malinovsky, S. Stich, and P . Richt´ ar ik, “Proxskip: Y es! local gradient steps provably lead to communication ac celeration! finally!” in International Conference on Machine Learning . PMLR, 2022, pp. 15 750–15 769
work page 2022
-
[15]
Decentral ized gradient tracking with local steps,
Y . Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentral ized gradient tracking with local steps,” Optimization Methods and Software , vol. 40, no. 5, pp. 1153–1180, 2025
work page 2025
-
[16]
Exact wor st-case performance of first-order methods for composite convex opt imization,
A. B. Taylor, J. M. Hendrickx, and F. Glineur, “Exact wor st-case performance of first-order methods for composite convex opt imization,” SIAM J. Optim. , vol. 27, no. 3, pp. 1283–1313, 2017
work page 2017
-
[17]
E. Meunier and J. M. Hendrickx, “Several performance bo unds on decentralized online optimization are highly conservativ e and potentially misleading,” in 2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 6201–6207
work page 2025
-
[18]
Achieving geometr ic convergence for distributed optimization over time-varying graphs,
A. Nedi´ c, A. Olshevsky, and W. Shi, “Achieving geometr ic convergence for distributed optimization over time-varying graphs,” SIAM J. Optim. , vol. 27, no. 4, pp. 2597–2633, 2017
work page 2017
-
[19]
Harnessing smoothness to accelerate di stributed optimization,
G. Qu and N. Li, “Harnessing smoothness to accelerate di stributed optimization,” IEEE Trans. Control Netw. Syst. , vol. 5, no. 3, pp. 1245– 1260, 2017
work page 2017
-
[20]
Convergence of async hronous distributed gradient methods over stochastic networks,
J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Convergence of async hronous distributed gradient methods over stochastic networks,” IEEE Trans. Autom. Control, vol. 63, no. 2, pp. 434–448, 2017
work page 2017
-
[21]
A linear algorithm for optimizati on over directed graphs with geometric convergence,
R. Xin and U. A. Khan, “A linear algorithm for optimizati on over directed graphs with geometric convergence,” IEEE Control Syst. Lett. , vol. 2, no. 3, pp. 315–320, 2018
work page 2018
-
[22]
Push–pull gradient m ethods for distributed optimization in networks,
S. Pu, W. Shi, J. Xu, and A. Nedi´ c, “Push–pull gradient m ethods for distributed optimization in networks,” IEEE Trans. Autom. Control , vol. 66, no. 1, pp. 1–16, 2021
work page 2021
-
[23]
Next: In-network nonconv ex optimiza- tion,
P . Di Lorenzo and G. Scutari, “Next: In-network nonconv ex optimiza- tion,” IEEE Trans. Signal Inf. Process. Netw. , vol. 2, no. 2, pp. 120–136, 2016
work page 2016
-
[24]
Decentralized federated learning with gradien t tracking over time-varying directed networks,
D. T. A. Nguyen, S. Wang, D. T. Nguyen, A. Nedich, and H. V . Poor, “Decentralized federated learning with gradien t tracking over time-varying directed networks,” 2024. [Online]. Ava ilable: https://arxiv.org/abs/2409.17189
-
[25]
Automatic performance es timation for decentralized optimization,
S. Colla and J. M. Hendrickx, “Automatic performance es timation for decentralized optimization,” IEEE Trans. Autom. Control, vol. 68, no. 12, pp. 7136–7150, 2023
work page 2023
-
[26]
Y . Nesterov, “Applied optimization,” Introductory lectures on convex optimization: A basic course , vol. 87, 2004
work page 2004
-
[27]
Performance of first-order me thods for smooth convex minimization: a novel approach,
Y . Drori and M. Teboulle, “Performance of first-order me thods for smooth convex minimization: a novel approach,” Mathematical Pro- gramming, vol. 145, no. 1, pp. 451–482, 2014
work page 2014
-
[28]
L. V andenberghe and S. Boyd, “Semidefinite programming ,” SIAM review, vol. 38, no. 1, pp. 49–95, 1996
work page 1996
-
[29]
On the set of possible minimizers of a sum of convex functions,
M. Zamani, F. Glineur, and J. M. Hendrickx, “On the set of possible minimizers of a sum of convex functions,” IEEE Control Syst. Lett. , vol. 8, pp. 1871–1876, 2024
work page 2024
-
[30]
The minimizer of the sum of two strongly convex functions,
K. Kuwaranancharoen and S. Sundaram, “The minimizer of the sum of two strongly convex functions,” Optimization, pp. 1–41, 2024
work page 2024
-
[31]
Exploiting agent symmetr ies for performance analysis of distributed optimization methods ,
S. Colla and J. M. Hendrickx, “Exploiting agent symmetr ies for performance analysis of distributed optimization methods ,” 2025. [Online]. Available: https://arxiv.org/abs/2403.11724
-
[32]
S. Das Gupta, B. P . V an Parys, and E. K. Ryu, “Branch-and- bound performance estimation programming: A unified methodology for con- structing optimal optimization methods,” Mathematical Programming , vol. 204, no. 1, pp. 567–639, 2024
work page 2024
- [33]
-
[34]
Optimal gradient sliding and its application to optimal di stributed optimization under similarity,
D. Kovalev, A. Beznosikov, E. Borodich, A. Gasnikov, an d G. Scutari, “Optimal gradient sliding and its application to optimal di stributed optimization under similarity,” Adv. Neural Inf. Process. Syst. , vol. 35, pp. 33 494–33 507, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.