A Unified Algorithm for Nonconvex Decentralized Nonlinear Optimization

Hao Wu; Liping Wang

arxiv: 2511.19182 · v3 · submitted 2025-11-24 · 🧮 math.OC

A Unified Algorithm for Nonconvex Decentralized Nonlinear Optimization

Hao Wu , Liping Wang This is my paper

Pith reviewed 2026-05-17 05:16 UTC · model grok-4.3

classification 🧮 math.OC

keywords decentralized optimizationnonconvex optimizationgradient trackingquasi-Newton methodsconvergence analysisKurdyka-Lojasiewicz condition

0 comments

The pith

A unified algorithmic framework encompasses many gradient tracking and quasi-Newton methods for decentralized nonconvex optimization with general convergence analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops one algorithmic structure that brings together a variety of existing methods for minimizing a sum of nonconvex functions over a network. This structure supports a shared proof of convergence that applies across the included methods under both general nonconvex conditions and the Kurdyka-Łojasiewicz property. A reader would care because it replaces separate case-by-case analyses with a single set of arguments and points to new quasi-Newton variants that numerical tests indicate run efficiently.

Core claim

The authors propose a unified decentralized nonconvex algorithmic framework that includes many existing state-of-the-art gradient tracking and quasi-Newton algorithms. A general framework for the convergence analysis of the unified algorithm is presented under both nonconvex and the Kurdyka-Łojasiewicz condition settings. In particular, some new quasi-Newton algorithms under this framework are proposed and shown through numerical results to be efficient compared with other state-of-the-art algorithms.

What carries the argument

The unified decentralized nonconvex algorithmic framework that generalizes both gradient tracking and quasi-Newton updates to support a common convergence analysis.

If this is right

Existing gradient tracking and quasi-Newton algorithms become special cases of the single framework.
Convergence guarantees apply uniformly under nonconvex assumptions and the Kurdyka-Łojasiewicz condition.
New quasi-Newton algorithms constructed inside the framework exhibit competitive numerical performance.
The results cover minimization of sums of continuously differentiable functions over fixed connected undirected networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structure could be tested on time-varying networks by adjusting the tracking mechanism to handle changing connections.
Hybrid updates that blend gradient tracking steps with quasi-Newton corrections might improve practical speed on large problems.
If the deterministic analysis holds, stochastic gradient versions could be derived by replacing exact gradients with noisy estimates.

Load-bearing premise

The communication network is fixed, connected, and undirected, and all local objective functions are continuously differentiable.

What would settle it

Showing that any algorithm covered by the unified framework fails to converge when the network is fixed, connected, and undirected and the local functions are continuously differentiable would falsify the general convergence claims.

Figures

Figures reproduced from arXiv: 2511.19182 by Hao Wu, Liping Wang.

**Figure 1.** Figure 1: Optimality error of UDNAs for minimizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. communication volume. All algorithms except GUT need two rounds of communication per iteration. GUT needs only one round communication per iteration but uses a decreasing stepsize, which yields slow convergence as shown in [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗

**Figure 2.** Figure 2: Comparisons with gradient-based algorithms for minimizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. communication volume. against communication volume is shown in [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗

**Figure 3.** Figure 3: Comparisons with quasi-Newton algorithms for minimizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. number of iteration. 3. Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed, Decentralized proximal gradient algorithms with linear convergence rates, IEEE Transactions on Automatic Control 66 (2020), no. 6, 2787–2794. 4. Sulaiman A Alghunaim and Kun Yuan, A unifie… view at source ↗

**Figure 4.** Figure 4: Comparisons with quasi-Newton algorithms for minimizing the nonconvex logistic regression problem (3.1) on different datasets w.r.t. communication volume. 11. Frank Curtis, A self-correcting variable-metric algorithm for stochastic optimization, International Conference on Machine Learning, PMLR, 2016, pp. 632–641. 12. Yuhong Dai and Caixia Kou, A nonlinear conjugate gradient algorithm with an optimal pr… view at source ↗

**Figure 5.** Figure 5: Comparisons with quasi-Newton algorithms for minimizing the nonconvex logistic regression problem (3.1) on coloncancer datasets. 19. William W Hager and Hongchao Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on optimization 16 (2005), no. 1, 170–192. 20. Kun Huang, Shi Pu, and Angelia Nedi´c, An accelerated distributed stochastic gradient method… view at source ↗

**Figure 6.** Figure 6: Comparisons with quasi-Newton algorithms for minimizing the nonconvex logistic regression problem (3.1) on duke breast-cancer datasets. 26. Eunjeong Jeong, Matteo Zecchin, and Marios Kountouris, Asynchronous decentralized learning over unreliable wireless networks, 2022 International Conference on Communications (ICC), 2022, pp. 607–612. 27. Donghui Li and Masao Fukushima, A modified bfgs method and its … view at source ↗

read the original abstract

In this paper, we study the decentralized optimization problem of minimizing a finite sum of continuously differentiable and possibly nonconvex functions over a fixed-connected undirected network. We propose a unified decentralized nonconvex algorithmic framework that includes many existing state-of-the-art gradient tracking and quasi-Newton algorithms. A general framework for the convergence analysis of our unified algorithm is presented under both nonconvex and the Kurdyka-{\L}ojasiewicz condition settings. In particular, some new quasi-Newton algorithms under this framework are proposed. Our numerical results show that these newly developed algorithms are very efficient compared with other state-of-the-art algorithms for solving decentralized nonconvex nonlinear optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies gradient tracking and quasi-Newton methods for decentralized nonconvex optimization under one convergence template and adds new variants, but the exact reproduction of prior algorithms needs explicit verification in the proofs.

read the letter

The main takeaway is that this work puts several existing gradient tracking and quasi-Newton algorithms for decentralized nonconvex problems into a single framework, supplies a general convergence analysis for both plain nonconvex cases and the Kurdyka-Łojasiewicz setting, and introduces some fresh quasi-Newton instances that the numerics suggest run efficiently against current options. The unification and the shared analysis are the clearest contributions; having one set of arguments cover multiple methods saves repeated proof work if the reductions hold. The experiments give concrete evidence that the new variants are competitive on the tested problems, which is useful data for anyone implementing these ideas. The network assumptions are the usual fixed connected undirected graph with continuously differentiable local functions, and the paper stays within those bounds without claiming broader coverage. The soft spot is whether the general update rule truly recovers the exact recursions of the cited state-of-the-art methods by simple parameter choices, or whether the tracking variable and Hessian approximation impose extra structure that some prior algorithms do not satisfy. The abstract states the inclusion, but the full reductions and any step-size or update restrictions should be checked line by line in the proofs. Soundness looks reasonable at the framework level, yet the low-level details on how the quasi-Newton updates are adapted to the decentralized setting are what will determine if the claims land cleanly. This is for people working on distributed nonconvex optimization in machine learning or networked control who want a reference template rather than a single new algorithm. A reader who needs to compare or extend these families would get practical value from the general statements and the numerical comparisons. It deserves a serious referee to examine the proof details and the precise unification steps. I would send it to peer review with the expectation that the authors can clarify the reductions and add any missing implementation specifics for the new variants.

Referee Report

2 major / 2 minor

Summary. The paper proposes a unified decentralized algorithmic framework for minimizing a sum of continuously differentiable nonconvex functions over a fixed connected undirected network. This framework is claimed to subsume many existing gradient-tracking and quasi-Newton methods as special cases via suitable parameter or matrix choices. A general convergence analysis is developed for the unified iteration under standard nonconvex assumptions and under the Kurdyka-Łojasiewicz condition; new quasi-Newton variants are introduced and shown numerically to outperform several state-of-the-art baselines.

Significance. If the unification is exact and the convergence framework is free of hidden restrictions, the work would provide a useful common lens for analyzing and extending decentralized nonconvex methods, potentially easing the derivation of new algorithms and their guarantees. The combination of gradient-tracking with quasi-Newton directions and the dual nonconvex/KL analysis are positive features; the numerical efficiency claims, if reproducible, add practical value.

major comments (2)

§3, general iteration (3.1)–(3.3): The unification claim requires that standard gradient-tracking recursions (e.g., the exact update of the tracking variable in GT) and specific quasi-Newton updates (e.g., BFGS or DFP variants) are recovered without modification by fixing the Hessian approximation matrix or step-size parameters. No explicit recovery table or derivation is supplied for at least two representative SOTA methods; without this, the inclusion is only schematic rather than exact.
§4, Theorem 4.1 and the step-size/Hessian conditions: The general convergence statement relies on a set of assumptions on the local Hessian approximations and the mixing matrix that may not hold identically for every cited prior algorithm. It is unclear whether the proof reduces directly to the known rates for those special cases or introduces additional restrictions on the curvature estimates.

minor comments (2)

Notation in §2: The definition of the gradient-tracking variable y_i^k should be accompanied by a one-line remark clarifying its relation to the standard GT literature to aid readability.
Numerical section: The tables would benefit from reporting wall-clock time in addition to iteration counts, and from including error bars or multiple random seeds for the reported objective values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive comments on our manuscript. We address each major comment below and will revise the paper to strengthen the presentation of the unification and convergence results.

read point-by-point responses

Referee: §3, general iteration (3.1)–(3.3): The unification claim requires that standard gradient-tracking recursions (e.g., the exact update of the tracking variable in GT) and specific quasi-Newton updates (e.g., BFGS or DFP variants) are recovered without modification by fixing the Hessian approximation matrix or step-size parameters. No explicit recovery table or derivation is supplied for at least two representative SOTA methods; without this, the inclusion is only schematic rather than exact.

Authors: We agree that an explicit recovery table would make the unification claim more precise and verifiable. In the revised manuscript, we will insert a dedicated table in Section 3 that lists the exact parameter and matrix choices (including the Hessian approximation and step-size selections) needed to recover standard gradient-tracking recursions such as GT and representative quasi-Newton methods such as BFGS and DFP variants directly from the general iteration (3.1)–(3.3) without any modifications. This addition will demonstrate that the inclusion is exact rather than schematic. revision: yes
Referee: §4, Theorem 4.1 and the step-size/Hessian conditions: The general convergence statement relies on a set of assumptions on the local Hessian approximations and the mixing matrix that may not hold identically for every cited prior algorithm. It is unclear whether the proof reduces directly to the known rates for those special cases or introduces additional restrictions on the curvature estimates.

Authors: The assumptions on the local Hessian approximations and mixing matrix in Theorem 4.1 are formulated to be satisfied by the standard choices appearing in the cited prior algorithms. When the Hessian approximation reduces to the identity matrix, the framework recovers gradient tracking; for BFGS/DFP-type updates the boundedness and curvature conditions align with those used in the literature. The general proof technique specializes directly to the existing analyses without imposing extra restrictions. We will add a clarifying remark and a short corollary in the revised Section 4 that explicitly shows this reduction and confirms that the known convergence rates are recovered. revision: yes

Circularity Check

0 steps flagged

No circularity: general framework yields independent convergence for special cases

full rationale

The paper defines a general decentralized update combining gradient tracking with a quasi-Newton direction and derives convergence for this general form under standard nonconvex and KL assumptions. Existing algorithms are recovered by explicit parameter choices in the update rule rather than by fitting or self-definition. No load-bearing step reduces to a prior self-citation, fitted input renamed as prediction, or ansatz smuggled via citation; the analysis stands on its own stated assumptions about the fixed connected network and differentiable objectives.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from decentralized optimization literature plus the continuously differentiable property of the objective functions.

axioms (2)

domain assumption The communication network is fixed, connected, and undirected.
Invoked to guarantee information propagation across agents.
domain assumption Each local objective function is continuously differentiable.
Required for gradient-based updates and convergence statements.

pith-pipeline@v0.9.0 · 5392 in / 1158 out tokens · 45700 ms · 2026-05-17T05:16:34.000624+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

36, 2023, pp

Sai Aparna Aketi, Abolfazl Hashemi, and Kaushik Roy, Global update tracking: A decentral- ized learning algorithm for heterogeneous data , Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 48939–48961

work page 2023
[2]

11, 7371–7386

Sulaiman A Alghunaim, Local exact-diffusion for decentralized optimization and learning , IEEE Transactions on Automatic Control 69 (2024), no. 11, 7371–7386. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 29 20 40 60 80 100 120 Number of iteration 10-15 10-10 10-5 100 Optimality errorDQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2...

work page 2024
[3]

6, 2787–2794

Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed, Decentralized proximal gradient algorithms with linear convergence rates , IEEE Transactions on Automatic Control 66 (2020), no. 6, 2787–2794

work page 2020
[4]

Sulaiman A Alghunaim and Kun Yuan, A unified and refined convergence analysis for non- convex decentralized learning, IEEE Transactions on Signal Processing 70 (2022), 3264–3279

work page 2022
[5]

1, 223–240

Neculai Andrei, A note on memory-less sr1 and memory-less bfgs methods for large-scale unconstrained optimization, Numerical Algorithms 90 (2022), no. 1, 223–240

work page 2022
[6]

Hedy Attouch and J´ erˆ ome Bolte,On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Mathematical Programming 116 (2009), 5–16

work page 2009
[7]

2, 438–457

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran,Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka- lojasiewicz inequality, Mathematics of operations research 35 (2010), no. 2, 438–457

work page 2010
[8]

Albert S Berahas, Raghu Bollapragada, and Shagun Gupta, Balancing communication and computation in gradient tracking algorithms for decentralized optimization , Journal of Opti- mization Theory and Applications (2024), 1–34

work page 2024
[9]

11, 4776–4790

Huiming Chen, Ho-Chun Wu, Shing-Chow Chan, and Wong-Hing Lam, A stochastic quasi- newton method for large-scale nonconvex optimization with applications , IEEE transactions on Neural Networks and Learning Systems 31 (2019), no. 11, 4776–4790

work page 2019
[10]

Xiaokai Chen, Tianyu Cao, and Gesualdo Scutari, Enhancing convergence of decentralized gradient tracking under the kl property , arXiv preprint arXiv:2412.09556 (2024). 30 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 5 10 15 Communication volume 105 10-15 10-10 10-5 100 Optimality error DQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2) (a) mushrooms 0.5 1 1.5 2 Communic...

work page arXiv 2024
[11]

Frank Curtis, A self-correcting variable-metric algorithm for stochastic optimization , Inter- national Conference on Machine Learning, PMLR, 2016, pp. 632–641

work page 2016
[12]

1, 296–320

Yuhong Dai and Caixia Kou, A nonlinear conjugate gradient algorithm with an optimal prop- erty and an improved wolfe line search , SIAM Journal on Optimization 23 (2013), no. 1, 296–320

work page 2013
[13]

4, 3029–3068

Amir Daneshmand, Gesualdo Scutari, and Vyacheslav Kungurtsev, Second-order guarantees of distributed gradient algorithms, SIAM Journal on Optimization30 (2020), no. 4, 3029–3068

work page 2020
[14]

2, 120–136

Paolo Di Lorenzo and Gesualdo Scutari, Next: In-network nonconvex optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 2, 120–136

work page 2016
[15]

Haizhou Du, Chaoqian Cheng, and Chengdong Ni, A unified momentum-based paradigm of decentralized sgd for non-convex models and heterogeneous data , Artificial Intelligence 332 (2024), 104130

work page 2024
[16]

10, 2613–2628

Mark Eisen, Aryan Mokhtari, and Alejandro Ribeiro, Decentralized quasi-newton methods, IEEE Transactions on Signal Processing 65 (2017), no. 10, 2613–2628

work page 2017
[17]

4, 3115–3127

Giuseppe Fusco and Mario Russo, A decentralized approach for voltage control by multiple distributed energy resources, IEEE Transactions on Smart Grid 12 (2021), no. 4, 3115–3127

work page 2021
[18]

In- formation Sciences 65 (2022), no

Juan Gao, Xinwei Liu, Yuhong Dai, Yakui Huang, and Peng Yang, Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes , Science China. In- formation Sciences 65 (2022), no. 4, 149204. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 31 50 100 150 200 250 Number of iteration 10-15 10-10 1...

work page 2022
[19]

1, 170–192

William W Hager and Hongchao Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on optimization 16 (2005), no. 1, 170–192

work page 2005
[20]

Kun Huang, Shi Pu, and Angelia Nedi´ c,An accelerated distributed stochastic gradient method with momentum, Mathematical Programming (2025), 1–44

work page 2025
[21]

19, 3064

Umair Hussan, Huaizhi Wang, Muhammad Ahsan Ayub, Hamna Rasheed, Muhammad As- ghar Majeed, Jianchun Peng, and Hui Jiang, Decentralized stochastic recursive gradient method for fully decentralized opf in multi-area power systems, Mathematics 12 (2024), no. 19, 3064

work page 2024
[22]

1, 31– 46

Duˇ san Jakoveti´ c,A unification and generalization of exact distributed first-order methods , IEEE Transactions on Signal and Information Processing over Networks 5 (2018), no. 1, 31– 46

work page 2018
[23]

11, 1923–1938

Duˇ san Jakoveti´ c, Dragana Bajovi´ c, Jo˜ ao Xavier, and Jos´ e MF Moura,Primal–dual methods for large-scale and distributed convex optimization and data analytics , Proceedings of the IEEE 108 (2020), no. 11, 1923–1938

work page 2020
[24]

Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,A hessian inversion-free exact second order method for distributed consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 8 (2022), 755–770

work page 2022
[25]

3, 703–728

Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,Exact spectral-like gradient method for distributed optimization, Computational Optimization and Applications74 (2019), no. 3, 703–728. 32 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 50 100 150 200 250 300 350 Number of iteration 10-15 10-10 10-5 100 Optimality error GT MT DQN D-LM-BFGS UD...

work page 2019
[26]

Eunjeong Jeong, Matteo Zecchin, and Marios Kountouris, Asynchronous decentralized learn- ing over unreliable wireless networks , 2022 International Conference on Communications (ICC), 2022, pp. 607–612

work page 2022
[27]

1-2, 15–35

Donghui Li and Masao Fukushima, A modified bfgs method and its global convergence in nonconvex minimization , Journal of Computational and Applied Mathematics 129 (2001), no. 1-2, 15–35

work page 2001
[28]

5, 1199–1232

Guoyin Li and Ting Kei Pong, Calculus of the exponent of kurdyka– lojasiewicz inequality and its applications to linear convergence of first-order methods , Foundations of computational mathematics 18 (2018), no. 5, 1199–1232

work page 2018
[29]

1689–1694

Yichuan Li, Yonghai Gong, Nikolaos M Freris, Petros Voulgaris, and Duˇ san Stipanovi´ c,Bfgs- admm for large-scale distributed optimization , 2021 60th IEEE Conference on Decision and Control (CDC), IEEE, 2021, pp. 1689–1694

work page 2021
[30]

Huikang Liu, Jiaojiao Zhang, Anthony Man-Cho So, and Qing Ling, A communication- efficient decentralized newton’s method with provably faster convergence , IEEE Transactions on Signal and Information Processing over Networks 9 (2023), 427–441

work page 2023
[31]

2, 305–323

Tao-Wen Liu, A regularized limited memory bfgs method for nonconvex unconstrained mini- mization, Numerical Algorithms 65 (2014), no. 2, 305–323

work page 2014
[32]

Stanislaw Lojasiewicz, Une propri´ et´ e topologique des sous-ensembles analytiques r´ eels, Les ´ equations aux d´ eriv´ ees partielles117 (1963), no. 87-89. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 33

work page 1963
[33]

Gabriel Mancino-Ball, Yangyang Xu, and Jie Chen, A decentralized primal-dual framework for non-convex smooth consensus optimization , IEEE Transactions on Signal Processing 71 (2023), 525–538

work page 2023
[34]

1, 146–161

Aryan Mokhtari, Qing Ling, and Alejandro Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing 65 (2016), no. 1, 146–161

work page 2016
[35]

4, 507–522

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro, A decentralized second-order method with exact linear convergence rate for consensus optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 4, 507–522

work page 2016
[36]

4, 2597–2633

Angelia Nedi´ c, Alex Olshevsky, and Wei Shi,Achieving geometric convergence for distributed optimization over time-varying graphs , SIAM Journal on Optimization 27 (2017), no. 4, 2597–2633

work page 2017
[37]

3950–3955

Angelia Nedi´ c, Alex Olshevsky, Wei Shi, and C´ esar A Uribe, Geometrically convergent distributed optimization with uncoordinated step-sizes , 2017 American Control Conference (ACC), IEEE, 2017, pp. 3950–3955

work page 2017
[38]

1, 48–61

Angelia Nedi´ c and Asuman Ozdaglar, Distributed subgradient methods for multi-agent opti- mization, IEEE Transactions on Automatic Control 54 (2009), no. 1, 48–61

work page 2009
[39]

2, 1089–1109

Yitian Qian, Ting Tao, Shaohua Pan, and Houduo Qi, Convergence of zh-type nonmonotone descent method for kurdyka– lojasiewicz optimization problems , SIAM Journal on Optimiza- tion 35 (2025), no. 2, 1089–1109

work page 2025
[40]

3, 1245–1260

Guannan Qu and Na Li, Harnessing smoothness to accelerate distributed optimization , IEEE Transactions on Control of Network Systems 5 (2017), no. 3, 1245–1260

work page 2017
[41]

6, 156–162

Yuben Qu, Haipeng Dai, Yan Zhuang, Jiafa Chen, Chao Dong, Fan Wu, and Song Guo, De- centralized federated learning for uav networks: Architecture, challenges, and opportunities , IEEE Network 35 (2022), no. 6, 156–162

work page 2022
[42]

3, 2014, pp

Ali H Sayed, Diffusion adaptation over networks , Academic Press Library in Signal Process- ing, vol. 3, 2014, pp. 323–453

work page 2014
[43]

Gesualdo Scutari and Ying Sun, Distributed nonconvex constrained optimization over time- varying digraphs, Mathematical Programming 176 (2019), 497–544

work page 2019
[44]

2, 944– 966

Wei Shi, Qing Ling, Gang Wu, and Wotao Yin, Extra: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization 25 (2015), no. 2, 944– 966

work page 2015
[45]

7, 1750–1761

Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin, On the linear convergence of the admm in decentralized consensus optimization , IEEE Transactions on Signal Processing 62 (2014), no. 7, 1750–1761

work page 2014
[46]

Ola Shorinwa and Mac Schwager, Distributed quasi-newton method for multi-agent optimiza- tion, IEEE Transactions on Signal Processing 72 (2024), 3535–3546

work page 2024
[47]

, Distributed quasi-newton method for multi-agent optimization , IEEE Transactions on Signal Processing 72 (2024), 3535–3546

work page 2024
[48]

Zhuoqing Song, Lei Shi, Shi Pu, and Ming Yan, Optimal gradient tracking for decentralized optimization, Mathematical Programming 207 (2024), no. 1, 1–53

work page 2024
[49]

4, 1597–1608

Akhil Sundararajan, Bryan Van Scoy, and Laurent Lessard, Analysis and design of first-order distributed optimization algorithms over time-varying graphs , IEEE Transactions on Control of Network Systems 7 (2020), no. 4, 1597–1608

work page 2020
[50]

Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, and Makoto Yamada, Momentum track- ing: Momentum acceleration for decentralized deep learning on heterogeneous data , Transac- tions on Machine Learning Research (2023)

work page 2023
[51]

Lei Wang, Nachuan Xiao, and Xin Liu, A double tracking method for optimization with decentralized generalized orthogonality constraints, arXiv preprint arXiv:2409.04998 (2024)

work page arXiv 2024
[52]

Liping Wang, Hao Wu, and Hongchao Zhang, A decentralized primal-dual method with quasi- newton tracking, IEEE Transactions on Signal Processing 73 (2025), 1323–1336

work page 2025
[53]

3, 4893–4907

Mou Wu, Haibin Liao, Zhengtao Ding, and Yonggang Xiao, Music: Accelerated convergence for distributed optimization with inexact and exact methods , IEEE Transactions on Neural Networks and Learning Systems 36 (2025), no. 3, 4893–4907

work page 2025
[54]

1, 33–46

Lin Xiao, Stephen Boyd, and Seung-Jean Kim, Distributed average consensus with least-mean- square deviation, Journal of Parallel and Distributed Computing 67 (2007), no. 1, 33–46

work page 2007
[55]

6, 2627–2633

Ran Xin and Usman A Khan, Distributed heavy-ball: A generalization and acceleration of first-order methods with gradient tracking , IEEE Transactions on Automatic Control 65 (2019), no. 6, 2627–2633. 34 HAO WU, LIPING WANG, AND HONGCHAO ZHANG

work page 2019
[56]

Jinming Xu, Ye Tian, Ying Sun, and Gesualdo Scutari, Distributed algorithms for compos- ite optimization: Unified framework and convergence analysis , IEEE Transactions on Signal Processing 69 (2021), 3555–3570

work page 2021
[57]

2055–2060

Jinming Xu, Shanying Zhu, Yeng Chai Soh, and Lihua Xie, Augmented distributed gradi- ent methods for multi-agent optimization under uncoordinated constant stepsizes , 2015 IEEE Conference on Decision and Control (CDC), 2015, pp. 2055–2060

work page 2015
[58]

3, 1835–1854

Kun Yuan, Qing Ling, and Wotao Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization 26 (2016), no. 3, 1835–1854

work page 2016
[59]

3, 708–723

Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H Sayed, Exact diffusion for distributed optimization and learning—part i: Algorithm development , IEEE Transactions on Signal Processing 67 (2018), no. 3, 708–723

work page 2018
[60]

11, 2834–2848

Jinshan Zeng and Wotao Yin, On nonconvex decentralized gradient descent , IEEE Transac- tions on Signal Processing 66 (2018), no. 11, 2834–2848

work page 2018
[61]

Jiaojiao Zhang, Qing Ling, and Anthony Man-Cho So, A newton tracking algorithm with exact linear convergence for decentralized consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 7 (2021), 346–358

work page 2021
[62]

Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, and Qing Ling, Variance-reduced stochas- tic quasi-newton methods for decentralized learning , IEEE Transactions on Signal Processing 71 (2023), 311–326

work page 2023
[63]

Jiaqi Zhang, Keyou You, and Kai Cai, Distributed dual gradient tracking for resource alloca- tion in unbalanced networks , IEEE Transactions on Signal Processing 68 (2020), 2186–2198

work page 2020
[64]

Appendix A

Xianyang Zhang, Chen Hu, Bing He, and Zhiguo Han, Distributed reptile algorithm for meta- learning over multi-agent systems , IEEE Transactions on Signal Processing 70 (2022), 5443– 5456. Appendix A. Analytical tools Lemma A.1. (Young’s inequality) For any two vectors v1, v2 ∈ Rp, 2vT 1 v2 ≤η∥v1∥2 + 1 η ∥v2∥2, ∥v1 + v2∥2 ≤(1 + η)∥v1∥2 + 1 + 1 η ∥v2∥2. Lem...

work page 2022

[1] [1]

36, 2023, pp

Sai Aparna Aketi, Abolfazl Hashemi, and Kaushik Roy, Global update tracking: A decentral- ized learning algorithm for heterogeneous data , Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 48939–48961

work page 2023

[2] [2]

11, 7371–7386

Sulaiman A Alghunaim, Local exact-diffusion for decentralized optimization and learning , IEEE Transactions on Automatic Control 69 (2024), no. 11, 7371–7386. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 29 20 40 60 80 100 120 Number of iteration 10-15 10-10 10-5 100 Optimality errorDQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2...

work page 2024

[3] [3]

6, 2787–2794

Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed, Decentralized proximal gradient algorithms with linear convergence rates , IEEE Transactions on Automatic Control 66 (2020), no. 6, 2787–2794

work page 2020

[4] [4]

Sulaiman A Alghunaim and Kun Yuan, A unified and refined convergence analysis for non- convex decentralized learning, IEEE Transactions on Signal Processing 70 (2022), 3264–3279

work page 2022

[5] [5]

1, 223–240

Neculai Andrei, A note on memory-less sr1 and memory-less bfgs methods for large-scale unconstrained optimization, Numerical Algorithms 90 (2022), no. 1, 223–240

work page 2022

[6] [6]

Hedy Attouch and J´ erˆ ome Bolte,On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Mathematical Programming 116 (2009), 5–16

work page 2009

[7] [7]

2, 438–457

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran,Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka- lojasiewicz inequality, Mathematics of operations research 35 (2010), no. 2, 438–457

work page 2010

[8] [8]

Albert S Berahas, Raghu Bollapragada, and Shagun Gupta, Balancing communication and computation in gradient tracking algorithms for decentralized optimization , Journal of Opti- mization Theory and Applications (2024), 1–34

work page 2024

[9] [9]

11, 4776–4790

Huiming Chen, Ho-Chun Wu, Shing-Chow Chan, and Wong-Hing Lam, A stochastic quasi- newton method for large-scale nonconvex optimization with applications , IEEE transactions on Neural Networks and Learning Systems 31 (2019), no. 11, 4776–4790

work page 2019

[10] [10]

Xiaokai Chen, Tianyu Cao, and Gesualdo Scutari, Enhancing convergence of decentralized gradient tracking under the kl property , arXiv preprint arXiv:2412.09556 (2024). 30 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 5 10 15 Communication volume 105 10-15 10-10 10-5 100 Optimality error DQN D-LM-BFGS DR-LM-DFP UDNA(1) UDNA(2) (a) mushrooms 0.5 1 1.5 2 Communic...

work page arXiv 2024

[11] [11]

Frank Curtis, A self-correcting variable-metric algorithm for stochastic optimization , Inter- national Conference on Machine Learning, PMLR, 2016, pp. 632–641

work page 2016

[12] [12]

1, 296–320

Yuhong Dai and Caixia Kou, A nonlinear conjugate gradient algorithm with an optimal prop- erty and an improved wolfe line search , SIAM Journal on Optimization 23 (2013), no. 1, 296–320

work page 2013

[13] [13]

4, 3029–3068

Amir Daneshmand, Gesualdo Scutari, and Vyacheslav Kungurtsev, Second-order guarantees of distributed gradient algorithms, SIAM Journal on Optimization30 (2020), no. 4, 3029–3068

work page 2020

[14] [14]

2, 120–136

Paolo Di Lorenzo and Gesualdo Scutari, Next: In-network nonconvex optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 2, 120–136

work page 2016

[15] [15]

Haizhou Du, Chaoqian Cheng, and Chengdong Ni, A unified momentum-based paradigm of decentralized sgd for non-convex models and heterogeneous data , Artificial Intelligence 332 (2024), 104130

work page 2024

[16] [16]

10, 2613–2628

Mark Eisen, Aryan Mokhtari, and Alejandro Ribeiro, Decentralized quasi-newton methods, IEEE Transactions on Signal Processing 65 (2017), no. 10, 2613–2628

work page 2017

[17] [17]

4, 3115–3127

Giuseppe Fusco and Mario Russo, A decentralized approach for voltage control by multiple distributed energy resources, IEEE Transactions on Smart Grid 12 (2021), no. 4, 3115–3127

work page 2021

[18] [18]

In- formation Sciences 65 (2022), no

Juan Gao, Xinwei Liu, Yuhong Dai, Yakui Huang, and Peng Yang, Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes , Science China. In- formation Sciences 65 (2022), no. 4, 149204. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 31 50 100 150 200 250 Number of iteration 10-15 10-10 1...

work page 2022

[19] [19]

1, 170–192

William W Hager and Hongchao Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on optimization 16 (2005), no. 1, 170–192

work page 2005

[20] [20]

Kun Huang, Shi Pu, and Angelia Nedi´ c,An accelerated distributed stochastic gradient method with momentum, Mathematical Programming (2025), 1–44

work page 2025

[21] [21]

19, 3064

Umair Hussan, Huaizhi Wang, Muhammad Ahsan Ayub, Hamna Rasheed, Muhammad As- ghar Majeed, Jianchun Peng, and Hui Jiang, Decentralized stochastic recursive gradient method for fully decentralized opf in multi-area power systems, Mathematics 12 (2024), no. 19, 3064

work page 2024

[22] [22]

1, 31– 46

Duˇ san Jakoveti´ c,A unification and generalization of exact distributed first-order methods , IEEE Transactions on Signal and Information Processing over Networks 5 (2018), no. 1, 31– 46

work page 2018

[23] [23]

11, 1923–1938

Duˇ san Jakoveti´ c, Dragana Bajovi´ c, Jo˜ ao Xavier, and Jos´ e MF Moura,Primal–dual methods for large-scale and distributed convex optimization and data analytics , Proceedings of the IEEE 108 (2020), no. 11, 1923–1938

work page 2020

[24] [24]

Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,A hessian inversion-free exact second order method for distributed consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 8 (2022), 755–770

work page 2022

[25] [25]

3, 703–728

Duˇ san Jakoveti´ c, Nataˇ sa Kreji´ c, and Nataˇ sa Krklec Jerinki´ c,Exact spectral-like gradient method for distributed optimization, Computational Optimization and Applications74 (2019), no. 3, 703–728. 32 HAO WU, LIPING WANG, AND HONGCHAO ZHANG 50 100 150 200 250 300 350 Number of iteration 10-15 10-10 10-5 100 Optimality error GT MT DQN D-LM-BFGS UD...

work page 2019

[26] [26]

Eunjeong Jeong, Matteo Zecchin, and Marios Kountouris, Asynchronous decentralized learn- ing over unreliable wireless networks , 2022 International Conference on Communications (ICC), 2022, pp. 607–612

work page 2022

[27] [27]

1-2, 15–35

Donghui Li and Masao Fukushima, A modified bfgs method and its global convergence in nonconvex minimization , Journal of Computational and Applied Mathematics 129 (2001), no. 1-2, 15–35

work page 2001

[28] [28]

5, 1199–1232

Guoyin Li and Ting Kei Pong, Calculus of the exponent of kurdyka– lojasiewicz inequality and its applications to linear convergence of first-order methods , Foundations of computational mathematics 18 (2018), no. 5, 1199–1232

work page 2018

[29] [29]

1689–1694

Yichuan Li, Yonghai Gong, Nikolaos M Freris, Petros Voulgaris, and Duˇ san Stipanovi´ c,Bfgs- admm for large-scale distributed optimization , 2021 60th IEEE Conference on Decision and Control (CDC), IEEE, 2021, pp. 1689–1694

work page 2021

[30] [30]

Huikang Liu, Jiaojiao Zhang, Anthony Man-Cho So, and Qing Ling, A communication- efficient decentralized newton’s method with provably faster convergence , IEEE Transactions on Signal and Information Processing over Networks 9 (2023), 427–441

work page 2023

[31] [31]

2, 305–323

Tao-Wen Liu, A regularized limited memory bfgs method for nonconvex unconstrained mini- mization, Numerical Algorithms 65 (2014), no. 2, 305–323

work page 2014

[32] [32]

Stanislaw Lojasiewicz, Une propri´ et´ e topologique des sous-ensembles analytiques r´ eels, Les ´ equations aux d´ eriv´ ees partielles117 (1963), no. 87-89. A UNIFIED DECENTRALIZED NONCONVEX ALGORITHM UNDER KURDYKA- LOJASIEWICZ PROPERTY 33

work page 1963

[33] [33]

Gabriel Mancino-Ball, Yangyang Xu, and Jie Chen, A decentralized primal-dual framework for non-convex smooth consensus optimization , IEEE Transactions on Signal Processing 71 (2023), 525–538

work page 2023

[34] [34]

1, 146–161

Aryan Mokhtari, Qing Ling, and Alejandro Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing 65 (2016), no. 1, 146–161

work page 2016

[35] [35]

4, 507–522

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro, A decentralized second-order method with exact linear convergence rate for consensus optimization , IEEE Transactions on Signal and Information Processing over Networks 2 (2016), no. 4, 507–522

work page 2016

[36] [36]

4, 2597–2633

Angelia Nedi´ c, Alex Olshevsky, and Wei Shi,Achieving geometric convergence for distributed optimization over time-varying graphs , SIAM Journal on Optimization 27 (2017), no. 4, 2597–2633

work page 2017

[37] [37]

3950–3955

Angelia Nedi´ c, Alex Olshevsky, Wei Shi, and C´ esar A Uribe, Geometrically convergent distributed optimization with uncoordinated step-sizes , 2017 American Control Conference (ACC), IEEE, 2017, pp. 3950–3955

work page 2017

[38] [38]

1, 48–61

Angelia Nedi´ c and Asuman Ozdaglar, Distributed subgradient methods for multi-agent opti- mization, IEEE Transactions on Automatic Control 54 (2009), no. 1, 48–61

work page 2009

[39] [39]

2, 1089–1109

Yitian Qian, Ting Tao, Shaohua Pan, and Houduo Qi, Convergence of zh-type nonmonotone descent method for kurdyka– lojasiewicz optimization problems , SIAM Journal on Optimiza- tion 35 (2025), no. 2, 1089–1109

work page 2025

[40] [40]

3, 1245–1260

Guannan Qu and Na Li, Harnessing smoothness to accelerate distributed optimization , IEEE Transactions on Control of Network Systems 5 (2017), no. 3, 1245–1260

work page 2017

[41] [41]

6, 156–162

Yuben Qu, Haipeng Dai, Yan Zhuang, Jiafa Chen, Chao Dong, Fan Wu, and Song Guo, De- centralized federated learning for uav networks: Architecture, challenges, and opportunities , IEEE Network 35 (2022), no. 6, 156–162

work page 2022

[42] [42]

3, 2014, pp

Ali H Sayed, Diffusion adaptation over networks , Academic Press Library in Signal Process- ing, vol. 3, 2014, pp. 323–453

work page 2014

[43] [43]

Gesualdo Scutari and Ying Sun, Distributed nonconvex constrained optimization over time- varying digraphs, Mathematical Programming 176 (2019), 497–544

work page 2019

[44] [44]

2, 944– 966

Wei Shi, Qing Ling, Gang Wu, and Wotao Yin, Extra: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization 25 (2015), no. 2, 944– 966

work page 2015

[45] [45]

7, 1750–1761

Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin, On the linear convergence of the admm in decentralized consensus optimization , IEEE Transactions on Signal Processing 62 (2014), no. 7, 1750–1761

work page 2014

[46] [46]

Ola Shorinwa and Mac Schwager, Distributed quasi-newton method for multi-agent optimiza- tion, IEEE Transactions on Signal Processing 72 (2024), 3535–3546

work page 2024

[47] [47]

, Distributed quasi-newton method for multi-agent optimization , IEEE Transactions on Signal Processing 72 (2024), 3535–3546

work page 2024

[48] [48]

Zhuoqing Song, Lei Shi, Shi Pu, and Ming Yan, Optimal gradient tracking for decentralized optimization, Mathematical Programming 207 (2024), no. 1, 1–53

work page 2024

[49] [49]

4, 1597–1608

Akhil Sundararajan, Bryan Van Scoy, and Laurent Lessard, Analysis and design of first-order distributed optimization algorithms over time-varying graphs , IEEE Transactions on Control of Network Systems 7 (2020), no. 4, 1597–1608

work page 2020

[50] [50]

Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, and Makoto Yamada, Momentum track- ing: Momentum acceleration for decentralized deep learning on heterogeneous data , Transac- tions on Machine Learning Research (2023)

work page 2023

[51] [51]

Lei Wang, Nachuan Xiao, and Xin Liu, A double tracking method for optimization with decentralized generalized orthogonality constraints, arXiv preprint arXiv:2409.04998 (2024)

work page arXiv 2024

[52] [52]

Liping Wang, Hao Wu, and Hongchao Zhang, A decentralized primal-dual method with quasi- newton tracking, IEEE Transactions on Signal Processing 73 (2025), 1323–1336

work page 2025

[53] [53]

3, 4893–4907

Mou Wu, Haibin Liao, Zhengtao Ding, and Yonggang Xiao, Music: Accelerated convergence for distributed optimization with inexact and exact methods , IEEE Transactions on Neural Networks and Learning Systems 36 (2025), no. 3, 4893–4907

work page 2025

[54] [54]

1, 33–46

Lin Xiao, Stephen Boyd, and Seung-Jean Kim, Distributed average consensus with least-mean- square deviation, Journal of Parallel and Distributed Computing 67 (2007), no. 1, 33–46

work page 2007

[55] [55]

6, 2627–2633

Ran Xin and Usman A Khan, Distributed heavy-ball: A generalization and acceleration of first-order methods with gradient tracking , IEEE Transactions on Automatic Control 65 (2019), no. 6, 2627–2633. 34 HAO WU, LIPING WANG, AND HONGCHAO ZHANG

work page 2019

[56] [56]

Jinming Xu, Ye Tian, Ying Sun, and Gesualdo Scutari, Distributed algorithms for compos- ite optimization: Unified framework and convergence analysis , IEEE Transactions on Signal Processing 69 (2021), 3555–3570

work page 2021

[57] [57]

2055–2060

Jinming Xu, Shanying Zhu, Yeng Chai Soh, and Lihua Xie, Augmented distributed gradi- ent methods for multi-agent optimization under uncoordinated constant stepsizes , 2015 IEEE Conference on Decision and Control (CDC), 2015, pp. 2055–2060

work page 2015

[58] [58]

3, 1835–1854

Kun Yuan, Qing Ling, and Wotao Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization 26 (2016), no. 3, 1835–1854

work page 2016

[59] [59]

3, 708–723

Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H Sayed, Exact diffusion for distributed optimization and learning—part i: Algorithm development , IEEE Transactions on Signal Processing 67 (2018), no. 3, 708–723

work page 2018

[60] [60]

11, 2834–2848

Jinshan Zeng and Wotao Yin, On nonconvex decentralized gradient descent , IEEE Transac- tions on Signal Processing 66 (2018), no. 11, 2834–2848

work page 2018

[61] [61]

Jiaojiao Zhang, Qing Ling, and Anthony Man-Cho So, A newton tracking algorithm with exact linear convergence for decentralized consensus optimization, IEEE Transactions on Signal and Information Processing over Networks 7 (2021), 346–358

work page 2021

[62] [62]

Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, and Qing Ling, Variance-reduced stochas- tic quasi-newton methods for decentralized learning , IEEE Transactions on Signal Processing 71 (2023), 311–326

work page 2023

[63] [63]

Jiaqi Zhang, Keyou You, and Kai Cai, Distributed dual gradient tracking for resource alloca- tion in unbalanced networks , IEEE Transactions on Signal Processing 68 (2020), 2186–2198

work page 2020

[64] [64]

Appendix A

Xianyang Zhang, Chen Hu, Bing He, and Zhiguo Han, Distributed reptile algorithm for meta- learning over multi-agent systems , IEEE Transactions on Signal Processing 70 (2022), 5443– 5456. Appendix A. Analytical tools Lemma A.1. (Young’s inequality) For any two vectors v1, v2 ∈ Rp, 2vT 1 v2 ≤η∥v1∥2 + 1 η ∥v2∥2, ∥v1 + v2∥2 ≤(1 + η)∥v1∥2 + 1 + 1 η ∥v2∥2. Lem...

work page 2022