pith. sign in

arxiv: 2511.19182 · v3 · submitted 2025-11-24 · 🧮 math.OC

A Unified Algorithm for Nonconvex Decentralized Nonlinear Optimization

Pith reviewed 2026-05-17 05:16 UTC · model grok-4.3

classification 🧮 math.OC
keywords decentralized optimizationnonconvex optimizationgradient trackingquasi-Newton methodsconvergence analysisKurdyka-Lojasiewicz condition
0
0 comments X

The pith

A unified algorithmic framework encompasses many gradient tracking and quasi-Newton methods for decentralized nonconvex optimization with general convergence analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops one algorithmic structure that brings together a variety of existing methods for minimizing a sum of nonconvex functions over a network. This structure supports a shared proof of convergence that applies across the included methods under both general nonconvex conditions and the Kurdyka-Łojasiewicz property. A reader would care because it replaces separate case-by-case analyses with a single set of arguments and points to new quasi-Newton variants that numerical tests indicate run efficiently.

Core claim

The authors propose a unified decentralized nonconvex algorithmic framework that includes many existing state-of-the-art gradient tracking and quasi-Newton algorithms. A general framework for the convergence analysis of the unified algorithm is presented under both nonconvex and the Kurdyka-Łojasiewicz condition settings. In particular, some new quasi-Newton algorithms under this framework are proposed and shown through numerical results to be efficient compared with other state-of-the-art algorithms.

What carries the argument

The unified decentralized nonconvex algorithmic framework that generalizes both gradient tracking and quasi-Newton updates to support a common convergence analysis.

If this is right

  • Existing gradient tracking and quasi-Newton algorithms become special cases of the single framework.
  • Convergence guarantees apply uniformly under nonconvex assumptions and the Kurdyka-Łojasiewicz condition.
  • New quasi-Newton algorithms constructed inside the framework exhibit competitive numerical performance.
  • The results cover minimization of sums of continuously differentiable functions over fixed connected undirected networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure could be tested on time-varying networks by adjusting the tracking mechanism to handle changing connections.
  • Hybrid updates that blend gradient tracking steps with quasi-Newton corrections might improve practical speed on large problems.
  • If the deterministic analysis holds, stochastic gradient versions could be derived by replacing exact gradients with noisy estimates.

Load-bearing premise

The communication network is fixed, connected, and undirected, and all local objective functions are continuously differentiable.

What would settle it

Showing that any algorithm covered by the unified framework fails to converge when the network is fixed, connected, and undirected and the local functions are continuously differentiable would falsify the general convergence claims.

Figures

Figures reproduced from arXiv: 2511.19182 by Hao Wu, Liping Wang.

Figure 1
Figure 1. Figure 1: Optimality error of UDNAs for minimizing the non￾convex logistic regression problem (3.1) on different datasets w.r.t. communication volume. All algorithms except GUT need two rounds of communication per iteration. GUT needs only one round communication per iteration but uses a decreasing step￾size, which yields slow convergence as shown in [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparisons with gradient-based algorithms for mini￾mizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. communication volume. against communication volume is shown in [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparisons with quasi-Newton algorithms for mini￾mizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. number of iteration. 3. Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed, Decentralized proximal gradient algorithms with linear convergence rates, IEEE Transactions on Automatic Control 66 (2020), no. 6, 2787–2794. 4. Sulaiman A Alghunaim and Kun Yuan, A unifie… view at source ↗
Figure 4
Figure 4. Figure 4: Comparisons with quasi-Newton algorithms for mini￾mizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. communication volume. 11. Frank Curtis, A self-correcting variable-metric algorithm for stochastic optimization, Inter￾national Conference on Machine Learning, PMLR, 2016, pp. 632–641. 12. Yuhong Dai and Caixia Kou, A nonlinear conjugate gradient algorithm with an optimal pr… view at source ↗
Figure 5
Figure 5. Figure 5: Comparisons with quasi-Newton algorithms for mini￾mizing the nonconvex logistic regression problem (3.1) on colon￾cancer datasets. 19. William W Hager and Hongchao Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on optimization 16 (2005), no. 1, 170–192. 20. Kun Huang, Shi Pu, and Angelia Nedi´c, An accelerated distributed stochastic gradient method… view at source ↗
Figure 6
Figure 6. Figure 6: Comparisons with quasi-Newton algorithms for min￾imizing the nonconvex logistic regression problem (3.1) on duke breast-cancer datasets. 26. Eunjeong Jeong, Matteo Zecchin, and Marios Kountouris, Asynchronous decentralized learn￾ing over unreliable wireless networks, 2022 International Conference on Communications (ICC), 2022, pp. 607–612. 27. Donghui Li and Masao Fukushima, A modified bfgs method and its … view at source ↗
read the original abstract

In this paper, we study the decentralized optimization problem of minimizing a finite sum of continuously differentiable and possibly nonconvex functions over a fixed-connected undirected network. We propose a unified decentralized nonconvex algorithmic framework that includes many existing state-of-the-art gradient tracking and quasi-Newton algorithms. A general framework for the convergence analysis of our unified algorithm is presented under both nonconvex and the Kurdyka-{\L}ojasiewicz condition settings. In particular, some new quasi-Newton algorithms under this framework are proposed. Our numerical results show that these newly developed algorithms are very efficient compared with other state-of-the-art algorithms for solving decentralized nonconvex nonlinear optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a unified decentralized algorithmic framework for minimizing a sum of continuously differentiable nonconvex functions over a fixed connected undirected network. This framework is claimed to subsume many existing gradient-tracking and quasi-Newton methods as special cases via suitable parameter or matrix choices. A general convergence analysis is developed for the unified iteration under standard nonconvex assumptions and under the Kurdyka-Łojasiewicz condition; new quasi-Newton variants are introduced and shown numerically to outperform several state-of-the-art baselines.

Significance. If the unification is exact and the convergence framework is free of hidden restrictions, the work would provide a useful common lens for analyzing and extending decentralized nonconvex methods, potentially easing the derivation of new algorithms and their guarantees. The combination of gradient-tracking with quasi-Newton directions and the dual nonconvex/KL analysis are positive features; the numerical efficiency claims, if reproducible, add practical value.

major comments (2)
  1. §3, general iteration (3.1)–(3.3): The unification claim requires that standard gradient-tracking recursions (e.g., the exact update of the tracking variable in GT) and specific quasi-Newton updates (e.g., BFGS or DFP variants) are recovered without modification by fixing the Hessian approximation matrix or step-size parameters. No explicit recovery table or derivation is supplied for at least two representative SOTA methods; without this, the inclusion is only schematic rather than exact.
  2. §4, Theorem 4.1 and the step-size/Hessian conditions: The general convergence statement relies on a set of assumptions on the local Hessian approximations and the mixing matrix that may not hold identically for every cited prior algorithm. It is unclear whether the proof reduces directly to the known rates for those special cases or introduces additional restrictions on the curvature estimates.
minor comments (2)
  1. Notation in §2: The definition of the gradient-tracking variable y_i^k should be accompanied by a one-line remark clarifying its relation to the standard GT literature to aid readability.
  2. Numerical section: The tables would benefit from reporting wall-clock time in addition to iteration counts, and from including error bars or multiple random seeds for the reported objective values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive comments on our manuscript. We address each major comment below and will revise the paper to strengthen the presentation of the unification and convergence results.

read point-by-point responses
  1. Referee: §3, general iteration (3.1)–(3.3): The unification claim requires that standard gradient-tracking recursions (e.g., the exact update of the tracking variable in GT) and specific quasi-Newton updates (e.g., BFGS or DFP variants) are recovered without modification by fixing the Hessian approximation matrix or step-size parameters. No explicit recovery table or derivation is supplied for at least two representative SOTA methods; without this, the inclusion is only schematic rather than exact.

    Authors: We agree that an explicit recovery table would make the unification claim more precise and verifiable. In the revised manuscript, we will insert a dedicated table in Section 3 that lists the exact parameter and matrix choices (including the Hessian approximation and step-size selections) needed to recover standard gradient-tracking recursions such as GT and representative quasi-Newton methods such as BFGS and DFP variants directly from the general iteration (3.1)–(3.3) without any modifications. This addition will demonstrate that the inclusion is exact rather than schematic. revision: yes

  2. Referee: §4, Theorem 4.1 and the step-size/Hessian conditions: The general convergence statement relies on a set of assumptions on the local Hessian approximations and the mixing matrix that may not hold identically for every cited prior algorithm. It is unclear whether the proof reduces directly to the known rates for those special cases or introduces additional restrictions on the curvature estimates.

    Authors: The assumptions on the local Hessian approximations and mixing matrix in Theorem 4.1 are formulated to be satisfied by the standard choices appearing in the cited prior algorithms. When the Hessian approximation reduces to the identity matrix, the framework recovers gradient tracking; for BFGS/DFP-type updates the boundedness and curvature conditions align with those used in the literature. The general proof technique specializes directly to the existing analyses without imposing extra restrictions. We will add a clarifying remark and a short corollary in the revised Section 4 that explicitly shows this reduction and confirms that the known convergence rates are recovered. revision: yes

Circularity Check

0 steps flagged

No circularity: general framework yields independent convergence for special cases

full rationale

The paper defines a general decentralized update combining gradient tracking with a quasi-Newton direction and derives convergence for this general form under standard nonconvex and KL assumptions. Existing algorithms are recovered by explicit parameter choices in the update rule rather than by fitting or self-definition. No load-bearing step reduces to a prior self-citation, fitted input renamed as prediction, or ansatz smuggled via citation; the analysis stands on its own stated assumptions about the fixed connected network and differentiable objectives.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from decentralized optimization literature plus the continuously differentiable property of the objective functions.

axioms (2)
  • domain assumption The communication network is fixed, connected, and undirected.
    Invoked to guarantee information propagation across agents.
  • domain assumption Each local objective function is continuously differentiable.
    Required for gradient-based updates and convergence statements.

pith-pipeline@v0.9.0 · 5392 in / 1158 out tokens · 45700 ms · 2026-05-17T05:16:34.000624+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    36, 2023, pp

    Sai Aparna Aketi, Abolfazl Hashemi, and Kaushik Roy, Global update tracking: A decentral- ized learning algorithm for heterogeneous data , Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 48939–48961

  2. [2]

    11, 7371–7386

    Sulaiman A Alghunaim, Local exact-diffusion for decentralized optimization and learning , IEEE Transactions on Automatic Control 69 (2024), no. 11, 7371–7386. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 29 20 40 60 80 100 120 Number of iteration 10-15 10-10 10-5 100 Optimality errorDQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2...

  3. [3]

    6, 2787–2794

    Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed, Decentralized proximal gradient algorithms with linear convergence rates , IEEE Transactions on Automatic Control 66 (2020), no. 6, 2787–2794

  4. [4]

    Sulaiman A Alghunaim and Kun Yuan, A unified and refined convergence analysis for non- convex decentralized learning, IEEE Transactions on Signal Processing 70 (2022), 3264–3279

  5. [5]

    1, 223–240

    Neculai Andrei, A note on memory-less sr1 and memory-less bfgs methods for large-scale unconstrained optimization, Numerical Algorithms 90 (2022), no. 1, 223–240

  6. [6]

    Hedy Attouch and J´ erˆ ome Bolte,On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Mathematical Programming 116 (2009), 5–16

  7. [7]

    2, 438–457

    H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran,Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka- lojasiewicz inequality, Mathematics of operations research 35 (2010), no. 2, 438–457

  8. [8]

    Albert S Berahas, Raghu Bollapragada, and Shagun Gupta, Balancing communication and computation in gradient tracking algorithms for decentralized optimization , Journal of Opti- mization Theory and Applications (2024), 1–34

  9. [9]

    11, 4776–4790

    Huiming Chen, Ho-Chun Wu, Shing-Chow Chan, and Wong-Hing Lam, A stochastic quasi- newton method for large-scale nonconvex optimization with applications , IEEE transactions on Neural Networks and Learning Systems 31 (2019), no. 11, 4776–4790

  10. [10]

    Xiaokai Chen, Tianyu Cao, and Gesualdo Scutari, Enhancing convergence of decentralized gradient tracking under the kl property , arXiv preprint arXiv:2412.09556 (2024). 30 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 5 10 15 Communication volume 105 10-15 10-10 10-5 100 Optimality error DQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2) (a) mushrooms 0.5 1 1.5 2 Communic...

  11. [11]

    Frank Curtis, A self-correcting variable-metric algorithm for stochastic optimization , Inter- national Conference on Machine Learning, PMLR, 2016, pp. 632–641

  12. [12]

    1, 296–320

    Yuhong Dai and Caixia Kou, A nonlinear conjugate gradient algorithm with an optimal prop- erty and an improved wolfe line search , SIAM Journal on Optimization 23 (2013), no. 1, 296–320

  13. [13]

    4, 3029–3068

    Amir Daneshmand, Gesualdo Scutari, and Vyacheslav Kungurtsev, Second-order guarantees of distributed gradient algorithms, SIAM Journal on Optimization30 (2020), no. 4, 3029–3068

  14. [14]

    2, 120–136

    Paolo Di Lorenzo and Gesualdo Scutari, Next: In-network nonconvex optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 2, 120–136

  15. [15]

    Haizhou Du, Chaoqian Cheng, and Chengdong Ni, A unified momentum-based paradigm of decentralized sgd for non-convex models and heterogeneous data , Artificial Intelligence 332 (2024), 104130

  16. [16]

    10, 2613–2628

    Mark Eisen, Aryan Mokhtari, and Alejandro Ribeiro, Decentralized quasi-newton methods, IEEE Transactions on Signal Processing 65 (2017), no. 10, 2613–2628

  17. [17]

    4, 3115–3127

    Giuseppe Fusco and Mario Russo, A decentralized approach for voltage control by multiple distributed energy resources, IEEE Transactions on Smart Grid 12 (2021), no. 4, 3115–3127

  18. [18]

    In- formation Sciences 65 (2022), no

    Juan Gao, Xinwei Liu, Yuhong Dai, Yakui Huang, and Peng Yang, Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes , Science China. In- formation Sciences 65 (2022), no. 4, 149204. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 31 50 100 150 200 250 Number of iteration 10-15 10-10 1...

  19. [19]

    1, 170–192

    William W Hager and Hongchao Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on optimization 16 (2005), no. 1, 170–192

  20. [20]

    Kun Huang, Shi Pu, and Angelia Nedi´ c,An accelerated distributed stochastic gradient method with momentum, Mathematical Programming (2025), 1–44

  21. [21]

    19, 3064

    Umair Hussan, Huaizhi Wang, Muhammad Ahsan Ayub, Hamna Rasheed, Muhammad As- ghar Majeed, Jianchun Peng, and Hui Jiang, Decentralized stochastic recursive gradient method for fully decentralized opf in multi-area power systems, Mathematics 12 (2024), no. 19, 3064

  22. [22]

    1, 31– 46

    Duˇ san Jakoveti´ c,A unification and generalization of exact distributed first-order methods , IEEE Transactions on Signal and Information Processing over Networks 5 (2018), no. 1, 31– 46

  23. [23]

    11, 1923–1938

    Duˇ san Jakoveti´ c, Dragana Bajovi´ c, Jo˜ ao Xavier, and Jos´ e MF Moura,Primal–dual methods for large-scale and distributed convex optimization and data analytics , Proceedings of the IEEE 108 (2020), no. 11, 1923–1938

  24. [24]

    Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,A hessian inversion-free exact second order method for distributed consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 8 (2022), 755–770

  25. [25]

    3, 703–728

    Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,Exact spectral-like gradient method for distributed optimization, Computational Optimization and Applications74 (2019), no. 3, 703–728. 32 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 50 100 150 200 250 300 350 Number of iteration 10-15 10-10 10-5 100 Optimality error GT MT DQN D-LM-BFGS UD...

  26. [26]

    Eunjeong Jeong, Matteo Zecchin, and Marios Kountouris, Asynchronous decentralized learn- ing over unreliable wireless networks , 2022 International Conference on Communications (ICC), 2022, pp. 607–612

  27. [27]

    1-2, 15–35

    Donghui Li and Masao Fukushima, A modified bfgs method and its global convergence in nonconvex minimization , Journal of Computational and Applied Mathematics 129 (2001), no. 1-2, 15–35

  28. [28]

    5, 1199–1232

    Guoyin Li and Ting Kei Pong, Calculus of the exponent of kurdyka– lojasiewicz inequality and its applications to linear convergence of first-order methods , Foundations of computational mathematics 18 (2018), no. 5, 1199–1232

  29. [29]

    1689–1694

    Yichuan Li, Yonghai Gong, Nikolaos M Freris, Petros Voulgaris, and Duˇ san Stipanovi´ c,Bfgs- admm for large-scale distributed optimization , 2021 60th IEEE Conference on Decision and Control (CDC), IEEE, 2021, pp. 1689–1694

  30. [30]

    Huikang Liu, Jiaojiao Zhang, Anthony Man-Cho So, and Qing Ling, A communication- efficient decentralized newton’s method with provably faster convergence , IEEE Transactions on Signal and Information Processing over Networks 9 (2023), 427–441

  31. [31]

    2, 305–323

    Tao-Wen Liu, A regularized limited memory bfgs method for nonconvex unconstrained mini- mization, Numerical Algorithms 65 (2014), no. 2, 305–323

  32. [32]

    Stanislaw Lojasiewicz, Une propri´ et´ e topologique des sous-ensembles analytiques r´ eels, Les ´ equations aux d´ eriv´ ees partielles117 (1963), no. 87-89. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 33

  33. [33]

    Gabriel Mancino-Ball, Yangyang Xu, and Jie Chen, A decentralized primal-dual framework for non-convex smooth consensus optimization , IEEE Transactions on Signal Processing 71 (2023), 525–538

  34. [34]

    1, 146–161

    Aryan Mokhtari, Qing Ling, and Alejandro Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing 65 (2016), no. 1, 146–161

  35. [35]

    4, 507–522

    Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro, A decentralized second-order method with exact linear convergence rate for consensus optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 4, 507–522

  36. [36]

    4, 2597–2633

    Angelia Nedi´ c, Alex Olshevsky, and Wei Shi,Achieving geometric convergence for distributed optimization over time-varying graphs , SIAM Journal on Optimization 27 (2017), no. 4, 2597–2633

  37. [37]

    3950–3955

    Angelia Nedi´ c, Alex Olshevsky, Wei Shi, and C´ esar A Uribe, Geometrically convergent distributed optimization with uncoordinated step-sizes , 2017 American Control Conference (ACC), IEEE, 2017, pp. 3950–3955

  38. [38]

    1, 48–61

    Angelia Nedi´ c and Asuman Ozdaglar, Distributed subgradient methods for multi-agent opti- mization, IEEE Transactions on Automatic Control 54 (2009), no. 1, 48–61

  39. [39]

    2, 1089–1109

    Yitian Qian, Ting Tao, Shaohua Pan, and Houduo Qi, Convergence of zh-type nonmonotone descent method for kurdyka– lojasiewicz optimization problems , SIAM Journal on Optimiza- tion 35 (2025), no. 2, 1089–1109

  40. [40]

    3, 1245–1260

    Guannan Qu and Na Li, Harnessing smoothness to accelerate distributed optimization , IEEE Transactions on Control of Network Systems 5 (2017), no. 3, 1245–1260

  41. [41]

    6, 156–162

    Yuben Qu, Haipeng Dai, Yan Zhuang, Jiafa Chen, Chao Dong, Fan Wu, and Song Guo, De- centralized federated learning for uav networks: Architecture, challenges, and opportunities , IEEE Network 35 (2022), no. 6, 156–162

  42. [42]

    3, 2014, pp

    Ali H Sayed, Diffusion adaptation over networks , Academic Press Library in Signal Process- ing, vol. 3, 2014, pp. 323–453

  43. [43]

    Gesualdo Scutari and Ying Sun, Distributed nonconvex constrained optimization over time- varying digraphs, Mathematical Programming 176 (2019), 497–544

  44. [44]

    2, 944– 966

    Wei Shi, Qing Ling, Gang Wu, and Wotao Yin, Extra: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization 25 (2015), no. 2, 944– 966

  45. [45]

    7, 1750–1761

    Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin, On the linear convergence of the admm in decentralized consensus optimization , IEEE Transactions on Signal Processing 62 (2014), no. 7, 1750–1761

  46. [46]

    Ola Shorinwa and Mac Schwager, Distributed quasi-newton method for multi-agent optimiza- tion, IEEE Transactions on Signal Processing 72 (2024), 3535–3546

  47. [47]

    , Distributed quasi-newton method for multi-agent optimization , IEEE Transactions on Signal Processing 72 (2024), 3535–3546

  48. [48]

    Zhuoqing Song, Lei Shi, Shi Pu, and Ming Yan, Optimal gradient tracking for decentralized optimization, Mathematical Programming 207 (2024), no. 1, 1–53

  49. [49]

    4, 1597–1608

    Akhil Sundararajan, Bryan Van Scoy, and Laurent Lessard, Analysis and design of first-order distributed optimization algorithms over time-varying graphs , IEEE Transactions on Control of Network Systems 7 (2020), no. 4, 1597–1608

  50. [50]

    Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, and Makoto Yamada, Momentum track- ing: Momentum acceleration for decentralized deep learning on heterogeneous data , Transac- tions on Machine Learning Research (2023)

  51. [51]

    Lei Wang, Nachuan Xiao, and Xin Liu, A double tracking method for optimization with decentralized generalized orthogonality constraints, arXiv preprint arXiv:2409.04998 (2024)

  52. [52]

    Liping Wang, Hao Wu, and Hongchao Zhang, A decentralized primal-dual method with quasi- newton tracking, IEEE Transactions on Signal Processing 73 (2025), 1323–1336

  53. [53]

    3, 4893–4907

    Mou Wu, Haibin Liao, Zhengtao Ding, and Yonggang Xiao, Music: Accelerated convergence for distributed optimization with inexact and exact methods , IEEE Transactions on Neural Networks and Learning Systems 36 (2025), no. 3, 4893–4907

  54. [54]

    1, 33–46

    Lin Xiao, Stephen Boyd, and Seung-Jean Kim, Distributed average consensus with least-mean- square deviation, Journal of Parallel and Distributed Computing 67 (2007), no. 1, 33–46

  55. [55]

    6, 2627–2633

    Ran Xin and Usman A Khan, Distributed heavy-ball: A generalization and acceleration of first-order methods with gradient tracking , IEEE Transactions on Automatic Control 65 (2019), no. 6, 2627–2633. 34 HAO WU, LIPING WANG, AND HONGCHAO ZHANG

  56. [56]

    Jinming Xu, Ye Tian, Ying Sun, and Gesualdo Scutari, Distributed algorithms for compos- ite optimization: Unified framework and convergence analysis , IEEE Transactions on Signal Processing 69 (2021), 3555–3570

  57. [57]

    2055–2060

    Jinming Xu, Shanying Zhu, Yeng Chai Soh, and Lihua Xie, Augmented distributed gradi- ent methods for multi-agent optimization under uncoordinated constant stepsizes , 2015 IEEE Conference on Decision and Control (CDC), 2015, pp. 2055–2060

  58. [58]

    3, 1835–1854

    Kun Yuan, Qing Ling, and Wotao Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization 26 (2016), no. 3, 1835–1854

  59. [59]

    3, 708–723

    Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H Sayed, Exact diffusion for distributed optimization and learning—part i: Algorithm development , IEEE Transactions on Signal Processing 67 (2018), no. 3, 708–723

  60. [60]

    11, 2834–2848

    Jinshan Zeng and Wotao Yin, On nonconvex decentralized gradient descent , IEEE Transac- tions on Signal Processing 66 (2018), no. 11, 2834–2848

  61. [61]

    Jiaojiao Zhang, Qing Ling, and Anthony Man-Cho So, A newton tracking algorithm with exact linear convergence for decentralized consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 7 (2021), 346–358

  62. [62]

    Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, and Qing Ling, Variance-reduced stochas- tic quasi-newton methods for decentralized learning , IEEE Transactions on Signal Processing 71 (2023), 311–326

  63. [63]

    Jiaqi Zhang, Keyou You, and Kai Cai, Distributed dual gradient tracking for resource alloca- tion in unbalanced networks , IEEE Transactions on Signal Processing 68 (2020), 2186–2198

  64. [64]

    Appendix A

    Xianyang Zhang, Chen Hu, Bing He, and Zhiguo Han, Distributed reptile algorithm for meta- learning over multi-agent systems , IEEE Transactions on Signal Processing 70 (2022), 5443– 5456. Appendix A. Analytical tools Lemma A.1. (Young’s inequality) For any two vectors v1, v2 ∈ Rp, 2vT 1 v2 ≤η∥v1∥2 + 1 η ∥v2∥2, ∥v1 + v2∥2 ≤(1 + η)∥v1∥2 + 1 + 1 η ∥v2∥2. Lem...