pith. sign in

arxiv: 2601.03442 · v4 · submitted 2026-01-06 · 📡 eess.SY · cs.LG· cs.SY

Local Updates in Distributed Optimization: Provable Acceleration and Topology Effects

Pith reviewed 2026-05-16 16:22 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY
keywords distributed optimizationlocal updatesDIGingperformance estimationconvergence accelerationnetwork topologymixing matrixstrong convexity
0
0 comments X

The pith

Local updates accelerate the DIGing algorithm for distributed optimization over smooth strongly convex functions, with two updates delivering the full gain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that adding local updates between communication rounds speeds up convergence in the DIGing distributed optimization method. This acceleration is established rigorously through tight bounds from performance estimation problems and holds for a broad class of objective functions without requiring a smaller step size that would erase the benefit. A reader would care because it supplies the first proof that local steps reduce communication needs while improving speed when gradients are exact. Only two local updates achieve the maximum improvement, further updates add cost with no extra gain, and the speedup shrinks on sparser networks.

Core claim

Incorporating local updates into the classic DIGing algorithm accelerates distributed optimization. Under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and additional local updates provide no further gains. These speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements.

What carries the argument

The local-update variant of DIGing analyzed through the Performance Estimation Problems framework to produce tight convergence bounds.

Load-bearing premise

An appropriate step size exists such that local updates accelerate the method without forcing a reduction in step size that cancels the benefit.

What would settle it

A simulation on a strongly convex smooth objective over a connected graph showing that the convergence rate with two local updates is no faster than with one update when steps are optimally tuned for each case.

Figures

Figures reproduced from arXiv: 2601.03442 by Yongqiang Wang, Zuang Wang.

Figure 1
Figure 1. Figure 1: Evolution of the optimization error under different [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PEP based exact worst-case convergence error of Algorithm 1 under different numbers of local updates, each using its respective optimal step size α ∗ , across various graph topologies. α ∗ for each τ was obtained via a grid search over α ∈ [0.01, 0.8] with 0.01 resolution for the function class Fµ,L with µ = 0.1 and L = 1. The number of agents is 4. It is worth noting that our PEP formulation differs from … view at source ↗
Figure 4
Figure 4. Figure 4: Algorithm 1’s exact worst-case convergence error at [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: DIGing-based training of the linear regression mode [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: DIGing-based training of CNN on the MNIST dataset [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

Inspired by the success of performing multiple local optimization steps between communication rounds in federated learning, incorporating such local updates into distributed optimization has recently attracted growing interest. However, unlike federated learning, where local updates can accelerate training by reducing gradient estimation error under minibatch settings, it remains unclear whether similar benefits persist when exact gradients are available. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. We also show that these speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that local updates (K gradient steps before mixing) in the DIGing algorithm accelerate convergence for smooth strongly convex functions, as established by tight PEP bounds. It asserts that K=2 achieves the maximal acceleration, further local updates yield no additional benefit, and the gains depend on the spectral properties of the mixing matrix (sparser graphs give smaller improvements). Experiments on synthetic and real-world data are said to corroborate the theory.

Significance. If the PEP analysis holds with an appropriate step size that delivers net acceleration without offset, the result would be significant: the first rigorous proof that local updates accelerate exact-gradient distributed optimization over a broad function class, plus practical guidance that K=2 suffices and topology modulates the benefit.

major comments (2)
  1. [PEP analysis section] PEP analysis section (step-size selection for local-update DIGing): the central claim requires explicit verification that the PEP-optimal step size for K=2 produces a strictly smaller contraction factor than for K=1. The modified iteration changes the feasible region of the PEP SDP; if the optimizer returns a smaller step size for K>1, the net rate improvement must be shown to remain positive. Without the explicit step-size expressions and rate comparison (across spectral gaps), it is unclear whether acceleration is realized in the original algorithm.
  2. [Theorem on maximal improvement] Theorem/Proposition on K=2 optimality: the statement that two local updates suffice for maximal improvement and additional updates bring no further gains must be supported by the explicit dependence of the PEP bound on K, including confirmation that the contraction factor plateaus for K>2 under the chosen step size.
minor comments (1)
  1. [Experiments] The experiments section should report the specific spectral gaps of the mixing matrices used, to directly link observed speed-ups to the claimed topology dependence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below. The PEP analysis already establishes the claimed acceleration and K=2 optimality through the SDP bounds, but we agree that adding explicit step-size expressions and direct comparisons will strengthen the presentation.

read point-by-point responses
  1. Referee: [PEP analysis section] PEP analysis section (step-size selection for local-update DIGing): the central claim requires explicit verification that the PEP-optimal step size for K=2 produces a strictly smaller contraction factor than for K=1. The modified iteration changes the feasible region of the PEP SDP; if the optimizer returns a smaller step size for K>1, the net rate improvement must be shown to remain positive. Without the explicit step-size expressions and rate comparison (across spectral gaps), it is unclear whether acceleration is realized in the original algorithm.

    Authors: We agree that explicit verification is valuable. The PEP SDP is solved numerically for the optimal step size at each K and spectral gap; the resulting contraction factors (reported via the performance bounds in Section 4) are strictly smaller for K=2 than K=1 across the tested range of mixing-matrix eigenvalues. To address the concern directly, we will add the closed-form optimal step-size expressions obtained from the PEP dual and a side-by-side rate comparison table (for representative spectral gaps) in the revised Section 4. This will confirm that the net improvement remains positive after step-size adjustment. revision: yes

  2. Referee: [Theorem on maximal improvement] Theorem/Proposition on K=2 optimality: the statement that two local updates suffice for maximal improvement and additional updates bring no further gains must be supported by the explicit dependence of the PEP bound on K, including confirmation that the contraction factor plateaus for K>2 under the chosen step size.

    Authors: The PEP formulation encodes the number of local steps K directly in the iteration matrix and feasible set. Solving the SDP shows that the worst-case contraction factor decreases from K=1 to K=2 and then remains constant for all K>2 under the corresponding optimal step size; this plateau is a direct consequence of the additional local steps becoming redundant once the local gradient information has been fully exploited within the performance estimation. We will add an explicit proposition (with the mathematical dependence on K) and the corresponding numerical confirmation from the SDP solver in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

PEP analysis of local-update DIGing invokes external methodology without reducing acceleration claim to fitted parameters or self-referential equations

full rationale

The paper's central derivation applies the established PEP framework (an external performance estimation technique) to the modified DIGing iteration with K local updates. No equation in the provided text defines the contraction factor or optimal step size in terms of the target acceleration quantity itself, nor does any load-bearing step reduce by construction to a self-citation or fitted input. The claim that K=2 suffices for maximal improvement is obtained by comparing PEP-derived rates across K values under standard smoothness/strong-convexity assumptions, which remains independent of the paper's own fitted quantities. Minor self-citation risk exists around prior DIGing work but is not load-bearing for the acceleration result. This yields a low circularity score consistent with an externally grounded bound.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on applying the PEP framework to a local-update variant of DIGing under standard optimization assumptions; no free parameters or invented entities are indicated in the abstract.

axioms (1)
  • domain assumption Objective functions are smooth and strongly convex (broad class allowing PEP analysis)
    Required for standard convergence-rate analysis in distributed optimization.

pith-pipeline@v0.9.0 · 5536 in / 1164 out tokens · 62599 ms · 2026-05-16T16:22:01.733464+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Distributed subgradient met hods for multi- agent optimization,

    A. Nedi´ c and A. Ozdaglar, “Distributed subgradient met hods for multi- agent optimization,” IEEE Trans. Autom. Control , vol. 54, no. 1, pp. 48–61, 2009

  2. [2]

    A collaborati ve training algorithm for distributed learning,

    J. B. Predd, S. R. Kulkarni, and H. V . Poor, “A collaborati ve training algorithm for distributed learning,” IEEE Trans. Inf. Theory , vol. 55, no. 4, pp. 1856–1871, 2009

  3. [3]

    Distributed coordination archi tecture for multi-robot formation control,

    W. Ren and N. Sorensen, “Distributed coordination archi tecture for multi-robot formation control,” Robot. Auton. Syst. , vol. 56, no. 4, pp. 324–333, 2008

  4. [4]

    Constrained c onsensus and optimization in multi-agent networks,

    A. Nedi´ c, A. Ozdaglar, and P . A. Parrilo, “Constrained c onsensus and optimization in multi-agent networks,” IEEE Trans. Autom. Control , vol. 55, no. 4, pp. 922–938, 2010

  5. [5]

    EXTRA: An exact first-or der algorithm for decentralized consensus optimization,

    W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An exact first-or der algorithm for decentralized consensus optimization,” SIAM J. Optim. , vol. 25, no. 2, pp. 944–966, 2015

  6. [6]

    Augmented distribute d gradient methods for multi-agent optimization under uncoordinated constant stepsizes,

    J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Augmented distribute d gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in IEEE Conf. Decis. Control , 2015, pp. 2055–2060

  7. [7]

    Distributed opti mization based on gradient tracking revisited: Enhancing convergen ce rate via surrogation,

    Y . Sun, G. Scutari, and A. Daneshmand, “Distributed opti mization based on gradient tracking revisited: Enhancing convergen ce rate via surrogation,” SIAM J. Optim. , vol. 32, no. 2, pp. 354–385, 2022. [Online]. Available: https://doi.org/10.1137/19M1259973

  8. [8]

    Local exact-diffusion for decentrali zed optimization and learning,

    S. A. Alghunaim, “Local exact-diffusion for decentrali zed optimization and learning,” IEEE Trans. Autom. Control , vol. 69, no. 11, pp. 7371– 7386, 2024

  9. [9]

    Robust decentralized learning with local updates and gradient tra cking,

    S. Ghiasvand, A. Reisizadeh, M. Alizadeh, and R. Pedarsa ni, “Robust decentralized learning with local updates and gradient tra cking,” IEEE Trans. Netw., 2025

  10. [10]

    Distributed stochastic momentum tr acking with local updates: Achieving optimal communication and iterat ion complex- ities,

    K. Huang and S. Pu, “Distributed stochastic momentum tr acking with local updates: Achieving optimal communication and iterat ion complex- ities,” arXiv preprint arXiv:2510.24155 , 2025

  11. [11]

    Guaranteeing both consens us and optimality in decentralized nonconvex optimization with m ultiple local updates,

    J. Liu, Z. Wang, and Y . Wang, “Guaranteeing both consens us and optimality in decentralized nonconvex optimization with m ultiple local updates,” arXiv preprint arXiv:2511.05242 , 2025

  12. [12]

    On the performance of gradient tracking with local updates,

    E. D. Hien Nguyen, S. A. Alghunaim, K. Y uan, and C. A. Urib e, “On the performance of gradient tracking with local updates,” i n IEEE Conf. Decis. Control, 2023, pp. 4309–4313

  13. [13]

    The effectiveness of local upda tes for decentralized learning under data heterogeneity,

    T. Wu, Z. Li, and Y . Sun, “The effectiveness of local upda tes for decentralized learning under data heterogeneity,” IEEE Trans. Signal Process., 2025

  14. [14]

    Proxskip: Y es! local gradient steps provably lead to communication ac celeration! finally!

    K. Mishchenko, G. Malinovsky, S. Stich, and P . Richt´ ar ik, “Proxskip: Y es! local gradient steps provably lead to communication ac celeration! finally!” in International Conference on Machine Learning . PMLR, 2022, pp. 15 750–15 769

  15. [15]

    Decentral ized gradient tracking with local steps,

    Y . Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentral ized gradient tracking with local steps,” Optimization Methods and Software , vol. 40, no. 5, pp. 1153–1180, 2025

  16. [16]

    Exact wor st-case performance of first-order methods for composite convex opt imization,

    A. B. Taylor, J. M. Hendrickx, and F. Glineur, “Exact wor st-case performance of first-order methods for composite convex opt imization,” SIAM J. Optim. , vol. 27, no. 3, pp. 1283–1313, 2017

  17. [17]

    Several performance bo unds on decentralized online optimization are highly conservativ e and potentially misleading,

    E. Meunier and J. M. Hendrickx, “Several performance bo unds on decentralized online optimization are highly conservativ e and potentially misleading,” in 2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 6201–6207

  18. [18]

    Achieving geometr ic convergence for distributed optimization over time-varying graphs,

    A. Nedi´ c, A. Olshevsky, and W. Shi, “Achieving geometr ic convergence for distributed optimization over time-varying graphs,” SIAM J. Optim. , vol. 27, no. 4, pp. 2597–2633, 2017

  19. [19]

    Harnessing smoothness to accelerate di stributed optimization,

    G. Qu and N. Li, “Harnessing smoothness to accelerate di stributed optimization,” IEEE Trans. Control Netw. Syst. , vol. 5, no. 3, pp. 1245– 1260, 2017

  20. [20]

    Convergence of async hronous distributed gradient methods over stochastic networks,

    J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Convergence of async hronous distributed gradient methods over stochastic networks,” IEEE Trans. Autom. Control, vol. 63, no. 2, pp. 434–448, 2017

  21. [21]

    A linear algorithm for optimizati on over directed graphs with geometric convergence,

    R. Xin and U. A. Khan, “A linear algorithm for optimizati on over directed graphs with geometric convergence,” IEEE Control Syst. Lett. , vol. 2, no. 3, pp. 315–320, 2018

  22. [22]

    Push–pull gradient m ethods for distributed optimization in networks,

    S. Pu, W. Shi, J. Xu, and A. Nedi´ c, “Push–pull gradient m ethods for distributed optimization in networks,” IEEE Trans. Autom. Control , vol. 66, no. 1, pp. 1–16, 2021

  23. [23]

    Next: In-network nonconv ex optimiza- tion,

    P . Di Lorenzo and G. Scutari, “Next: In-network nonconv ex optimiza- tion,” IEEE Trans. Signal Inf. Process. Netw. , vol. 2, no. 2, pp. 120–136, 2016

  24. [24]

    Decentralized federated learning with gradien t tracking over time-varying directed networks,

    D. T. A. Nguyen, S. Wang, D. T. Nguyen, A. Nedich, and H. V . Poor, “Decentralized federated learning with gradien t tracking over time-varying directed networks,” 2024. [Online]. Ava ilable: https://arxiv.org/abs/2409.17189

  25. [25]

    Automatic performance es timation for decentralized optimization,

    S. Colla and J. M. Hendrickx, “Automatic performance es timation for decentralized optimization,” IEEE Trans. Autom. Control, vol. 68, no. 12, pp. 7136–7150, 2023

  26. [26]

    Applied optimization,

    Y . Nesterov, “Applied optimization,” Introductory lectures on convex optimization: A basic course , vol. 87, 2004

  27. [27]

    Performance of first-order me thods for smooth convex minimization: a novel approach,

    Y . Drori and M. Teboulle, “Performance of first-order me thods for smooth convex minimization: a novel approach,” Mathematical Pro- gramming, vol. 145, no. 1, pp. 451–482, 2014

  28. [28]

    Semidefinite programming ,

    L. V andenberghe and S. Boyd, “Semidefinite programming ,” SIAM review, vol. 38, no. 1, pp. 49–95, 1996

  29. [29]

    On the set of possible minimizers of a sum of convex functions,

    M. Zamani, F. Glineur, and J. M. Hendrickx, “On the set of possible minimizers of a sum of convex functions,” IEEE Control Syst. Lett. , vol. 8, pp. 1871–1876, 2024

  30. [30]

    The minimizer of the sum of two strongly convex functions,

    K. Kuwaranancharoen and S. Sundaram, “The minimizer of the sum of two strongly convex functions,” Optimization, pp. 1–41, 2024

  31. [31]

    Exploiting agent symmetr ies for performance analysis of distributed optimization methods ,

    S. Colla and J. M. Hendrickx, “Exploiting agent symmetr ies for performance analysis of distributed optimization methods ,” 2025. [Online]. Available: https://arxiv.org/abs/2403.11724

  32. [32]

    Branch-and- bound performance estimation programming: A unified methodology for con- structing optimal optimization methods,

    S. Das Gupta, B. P . V an Parys, and E. K. Ryu, “Branch-and- bound performance estimation programming: A unified methodology for con- structing optimal optimization methods,” Mathematical Programming , vol. 204, no. 1, pp. 567–639, 2024

  33. [33]

    Kamri, J

    Y . Kamri, J. M. Hendrickx, and F. Glineur, “Numerical de sign of optimized first-order algorithms,” 2025. [Online]. Avai lable: https://arxiv.org/abs/2507.20773

  34. [34]

    Optimal gradient sliding and its application to optimal di stributed optimization under similarity,

    D. Kovalev, A. Beznosikov, E. Borodich, A. Gasnikov, an d G. Scutari, “Optimal gradient sliding and its application to optimal di stributed optimization under similarity,” Adv. Neural Inf. Process. Syst. , vol. 35, pp. 33 494–33 507, 2022