pith. sign in

arxiv: 2604.23754 · v2 · submitted 2026-04-26 · 🧮 math.OC

A Retraction-Free EXTRA Method for Decentralized Optimization on the Stiefel Manifold

Pith reviewed 2026-05-08 05:48 UTC · model grok-4.3

classification 🧮 math.OC
keywords decentralized optimizationStiefel manifoldretraction-free methodEXTRA algorithmO(1/K) convergenceorthogonality constraintsprimal-dual optimizationdistributed learning
0
0 comments X

The pith

RF-EXTRA achieves an exact O(1/K) convergence rate to stationary points for decentralized optimization on the Stiefel manifold with constant step sizes and no retractions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a decentralized method called RF-EXTRA for solving optimization problems subject to orthogonality constraints on the Stiefel manifold. It combines an approximate gradient mapping to handle the manifold constraints with an EXTRA-based decentralized recursion to enable distributed computation without retractions. The analysis focuses on the contractivity of the joint error between local variables and their network averages, which allows the use of small constant step sizes. This leads to an O(1/K) convergence guarantee, which is useful for large-scale distributed tasks like principal component analysis where retractions would be computationally expensive. Sympathetic readers would care because it simplifies communication and avoids manifold-specific operations while maintaining convergence.

Core claim

RF-EXTRA is a distributed retraction-free primal-dual method that, by establishing a contractive recursion for the joint error (X_k - average X_k, s_k - average s_k), ensures that the joint error can be controlled using small yet constant step sizes, leading to an exact O(1/K) convergence rate to a stationary point on static undirected networks.

What carries the argument

The joint error vector consisting of deviations in local variables and local directions from their averages, whose contractive recursion is established under the approximate gradient mapping and EXTRA recursion.

Load-bearing premise

The joint-error recursion remains contractive when the approximate gradient mapping for the orthogonality constraints is paired with the EXTRA decentralized update on static undirected networks.

What would settle it

Observing that the joint error fails to contract or the convergence rate exceeds O(1/K) for some constant step size on a static undirected network with the given mapping would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.23754 by Jiang Hu, Shu Li.

Figure 1
Figure 1. Figure 1: Synthetic decentralized PCA: robustness of RF-EXTRA with respect to graph topology and view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic decentralized PCA on ER(0.6) versus communication quantities. Each method uses its best step size selected from {1, 2, 4, 6, 8} × {10−5 , 10−4 , 10−3 , 10−2}. Under this matched search space, RF-EXTRA, DESTINY, DPRGT, and REXTRA all select βˆ = 0.08, while DPRGD selects βˆ = 0.006. is competitive with the strongest baselines across the communication budget, and its performance is close to that of… view at source ↗
Figure 3
Figure 3. Figure 3: Decentralized PCA on the MNIST dataset versus communication quantities. RF-EXTRA, DES view at source ↗
Figure 4
Figure 4. Figure 4: Decentralized LRMC on the ring graph versus communication quantities. Only the stationarity view at source ↗
Figure 5
Figure 5. Figure 5: Decentralized LRMC on the ring graph versus communication quantities for representative RF view at source ↗
read the original abstract

Decentralized optimization provides a fundamental framework for large-scale learning and signal processing with distributed data. We study decentralized optimization with orthogonality constraints on the Stiefel manifold and propose RF-EXTRA, a distributed retraction-free primal-dual method on static undirected networks. The method combines an approximate gradient mapping for orthogonality-constrained optimization with an EXTRA-based decentralized recursion, thereby avoiding retractions while preserving a simple communication pattern. On the theoretical side, the analysis considers \revise{the joint error} $(\mathbf{X}_k-\overline{\mathbf X}_k,\mathbf{s}_k-\overline{\mathbf s}_k)$ in the local variables and local directions, and establishes a contractive recursion for the joint error. This contractivity ensures that the joint error can be controlled using small yet constant step sizes, thus leading to an exact $\mathcal{O}(1/K)$ convergence rate of RF-EXTRA to a stationary point. Experiments on PCA and low-rank matrix completion show that RF-EXTRA compares favorably with the reported decentralized baselines and exhibits strong communication efficiency on the tested tasks on the Stiefel manifold.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes RF-EXTRA, a retraction-free primal-dual decentralized optimization algorithm for problems on the Stiefel manifold. It combines an approximate gradient mapping to handle orthogonality constraints without retractions and an EXTRA-based recursion for communication on static undirected networks. The central claim is that the joint error (X_k - average X_k, s_k - average s_k) satisfies a contractive recursion, which permits constant step sizes and yields an exact O(1/K) convergence rate to a stationary point. Experiments on PCA and low-rank matrix completion show competitive performance and communication efficiency relative to baselines.

Significance. If the joint-error contractivity holds, the result is significant because it removes the need for retraction operations in distributed manifold optimization, which are often expensive or unstable. The exact O(1/K) rate with constant steps and simple communication pattern is a practical advantage for large-scale distributed tasks such as PCA. The approach of analyzing the combined primal-dual deviation vector is a clean way to obtain the rate, and the empirical results on standard tasks add value.

major comments (2)
  1. [Analysis section on joint-error recursion] The derivation of the contractive recursion for the joint error (X_k - average X_k, s_k - average s_k) is load-bearing for the O(1/K) claim. The bounding of cross terms arising from the decentralized EXTRA updates and the first-order approximation to the orthogonality constraint must be presented with explicit constants so that the spectral-radius condition (strictly less than one) and the allowable constant step-size range can be verified directly.
  2. [Section introducing the approximate gradient mapping] The contractivity relies on the approximate gradient mapping respecting the orthogonality constraint. The precise definition of this mapping, together with its Lipschitz constant or approximation-error bound, must be stated explicitly because these quantities enter the step-size restriction that guarantees the spectral radius is less than one.
minor comments (3)
  1. [Abstract] The abstract contains the LaTeX command 'revise{the joint error}'; replace this with clean text and ensure the term 'joint error' is defined consistently in the introduction and analysis.
  2. [Experiments] The experimental section should report the network size, topology, and exact communication metric (e.g., total scalar transmissions per iteration) to make the claimed communication efficiency quantitative.
  3. [Notation and preliminaries] Notation for the averages (overline{X}_k and overline{s}_k) should be introduced at the first use rather than assumed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the precise comments on the analysis. We address the two major comments below and will revise the manuscript accordingly to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [Analysis section on joint-error recursion] The derivation of the contractive recursion for the joint error (X_k - average X_k, s_k - average s_k) is load-bearing for the O(1/K) claim. The bounding of cross terms arising from the decentralized EXTRA updates and the first-order approximation to the orthogonality constraint must be presented with explicit constants so that the spectral-radius condition (strictly less than one) and the allowable constant step-size range can be verified directly.

    Authors: We agree that the bounding steps for the cross terms must be expanded with explicit constants to allow direct verification of the spectral radius and step-size range. The manuscript derives the joint-error recursion in the analysis section, but the intermediate bounds are presented compactly. In the revision we will insert the full expansion of each cross-term bound, compute the resulting spectral-radius expression explicitly, and state the resulting restriction on the constant step size. revision: yes

  2. Referee: [Section introducing the approximate gradient mapping] The contractivity relies on the approximate gradient mapping respecting the orthogonality constraint. The precise definition of this mapping, together with its Lipschitz constant or approximation-error bound, must be stated explicitly because these quantities enter the step-size restriction that guarantees the spectral radius is less than one.

    Authors: We accept the point that the definition and quantitative properties of the approximate gradient mapping need to be stated more explicitly. The mapping is introduced as a first-order approximation that preserves the orthogonality constraint to first order; its Lipschitz constant and approximation-error bound appear in the subsequent analysis but are not highlighted at the definition stage. In the revised manuscript we will restate the precise definition at the beginning of the relevant section, list the Lipschitz and error constants, and show explicitly how they propagate into the step-size condition that ensures the spectral radius is strictly less than one. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via explicit joint-error bounds

full rationale

The paper's central result is the contractive recursion on the joint error vector (X_k - avg X_k, s_k - avg s_k) obtained by bounding the cross terms that arise from the EXTRA mixing matrices on static undirected graphs together with the first-order approximation to the orthogonality constraint. This produces a linear system whose spectral radius is strictly less than one for sufficiently small constant step sizes, directly yielding the O(1/K) rate to stationarity. No equation or claim reduces to a fitted parameter renamed as a prediction, a self-citation whose content is itself unverified, or a definitional equivalence; the analysis is presented as an independent derivation resting on standard Lipschitz and network assumptions rather than re-deriving prior constants by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The method appears to rest on standard assumptions for decentralized consensus and manifold optimization plus one paper-specific construction (the approximate gradient mapping). No free parameters or invented entities are named.

axioms (2)
  • domain assumption The network is static and undirected.
    Stated in the abstract as the setting for the decentralized recursion.
  • ad hoc to paper An approximate gradient mapping exists that respects the orthogonality constraint without retraction.
    Central to the retraction-free claim; invoked to combine with EXTRA.

pith-pipeline@v0.9.0 · 5491 in / 1515 out tokens · 31149 ms · 2026-05-08T05:48:27.132056+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    Distributed asynchronous deterministic and stochastic gradient optimization algorithms.IEEE transactions on automatic control, 31(9):803–812, 1986

    John Tsitsiklis, Dimitri Bertsekas, and Michael Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms.IEEE transactions on automatic control, 31(9):803–812, 1986

  2. [2]

    Distributed subgradient methods for multi-agent optimization

    Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on automatic control, 54(1):48–61, 2009

  3. [3]

    On the convergence of decentralized gradient descent.SIAM Journal on Optimization, 26(3):1835–1854, 2016

    Kun Yuan, Qing Ling, and Wotao Yin. On the convergence of decentralized gradient descent.SIAM Journal on Optimization, 26(3):1835–1854, 2016

  4. [4]

    Alghunaim and Kun Yuan

    Sulaiman A. Alghunaim and Kun Yuan. A unified and refined convergence analysis for non-convex decentralized learning.IEEE Transactions on Signal Processing, 2022

  5. [5]

    Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H. Sayed. Exact diffusion for distributed opti- mization and learning — part i: Algorithm development.IEEE Transactions on Signal Processing, 2018

  6. [6]

    EXTRA: An exact first-order algorithm for decentralized consensus optimization.SIAM Journal on Optimization, 25(2):944–966, 2015

    Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. EXTRA: An exact first-order algorithm for decentralized consensus optimization.SIAM Journal on Optimization, 25(2):944–966, 2015

  7. [7]

    Distributed constrained optimal consensus of multi- agent systems.Automatica, 68:209–215, 2016

    Zhirong Qiu, Shuzhi Sam Ge Liu, and Lihua Xie. Distributed constrained optimal consensus of multi- agent systems.Automatica, 68:209–215, 2016. doi: 10.1016/j.automatica.2016.01.055

  8. [8]

    Dual averaging for distributed optimization: Convergence analysis and network scaling.IEEE Transactions on Automatic Control, 2011

    John Duchi, Alekh Agarwal, and Martin Wainwright. Dual averaging for distributed optimization: Convergence analysis and network scaling.IEEE Transactions on Automatic Control, 2011

  9. [9]

    Orthogonal weight nor- malization: Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks

    Lei Huang, Xianglong Liu, Bo Lang, Adams Yu, Yongliang Wang, and Bo Li. Orthogonal weight nor- malization: Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, 2018

  10. [10]

    Riemannian approach to batch normalization

    Minhyung Cho and Jaehyung Lee. Riemannian approach to batch normalization. InAdvances in Neural Information Processing Systems, 2017

  11. [11]

    Riemannian preconditioned lora for fine-tuning foundation mod- els.arXiv preprint arXiv:2402.02347,

    Fangzhao Zhang and Mert Pilanci. Riemannian preconditioned LoRA for fine-tuning foundation models. arXiv preprint arXiv:2402.02347, 2024

  12. [12]

    Retraction-free optimization over the Stiefel manifold with application to the LoRA fine-tuning

    Yuan Zhang, Jiang Hu, Jiaxi Cui, Lin Lin, Zaiwen Wen, and Quanzheng Li. Retraction-free optimization over the Stiefel manifold with application to the LoRA fine-tuning. 2024

  13. [13]

    Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R

    L. Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R. Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, and Michael Heroux. An updated set of basic linear algebra subprograms (BLAS).ACM Transactions on Mathematical Software, 28(2):135–151, 2002

  14. [14]

    Anastasia Koloskova, Tao Lin, and Sebastian U. Stich. An improved analysis of gradient tracking for decentralized machine learning. InAdvances in Neural Information Processing Systems, 2021

  15. [15]

    D2: Decentralized training over decen- tralized data

    Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, and Ji Liu. D2: Decentralized training over decen- tralized data. InProceedings of the International Conference on Machine Learning, 2018. 21

  16. [16]

    Achieving geometric convergence for distributed optimiza- tion over time-varying graphs.SIAM Journal on Optimization, 27(4):2597–2633, 2017

    Angelia Nedić, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimiza- tion over time-varying graphs.SIAM Journal on Optimization, 27(4):2597–2633, 2017

  17. [17]

    Harnessing smoothness to accelerate distributed optimization.IEEE Trans- actions on Control of Network Systems, 5(3):1245–1260, 2017

    Guannan Qu and Na Li. Harnessing smoothness to accelerate distributed optimization.IEEE Trans- actions on Control of Network Systems, 5(3):1245–1260, 2017

  18. [18]

    S-diging: A stochastic gradient tracking algorithm for distributed optimization.IEEE Transactions on Emerging Topics in Computational Intelligence, 2020

    Huaqing Li, Lifeng Zheng, Zheng Wang, Yu Yan, Liping Feng, and Jing Guo. S-diging: A stochastic gradient tracking algorithm for distributed optimization.IEEE Transactions on Emerging Topics in Computational Intelligence, 2020

  19. [19]

    Distributed stochastic gradient tracking methods.Mathematical Program- ming, 2021

    Shi Pu and Angelia Nedić. Distributed stochastic gradient tracking methods.Mathematical Program- ming, 2021

  20. [20]

    Yue Liu, Tao Lin, Anastasia Koloskova, and Sebastian U. Stich. Decentralized gradient tracking with local steps.Optimization Methods and Software, 2025

  21. [21]

    A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates.IEEE Transactions on Signal Processing, 67(17):4494–4506, 2019

    Zhi Li, Wei Shi, and Ming Yan. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates.IEEE Transactions on Signal Processing, 67(17):4494–4506, 2019

  22. [22]

    Prox-PDA:Theproximalprimal-dualalgorithm for fast distributed nonconvex optimization and learning over networks

    MingyiHong, DavoodHajinezhad, andMing-MinZhao. Prox-PDA:Theproximalprimal-dualalgorithm for fast distributed nonconvex optimization and learning over networks. InInternational Conference on Machine Learning, pages 1529–1538. PMLR, 2017

  23. [23]

    Alghunaim, and Xinmeng Huang

    Kun Yuan, Sulaiman A. Alghunaim, and Xinmeng Huang. Removing data heterogeneity influence enhances network topology dependence of decentralized SGD.Journal of Machine Learning Research, 2023

  24. [24]

    Next: In-network nonconvex optimization.IEEE Transactions on Signal and Information Processing over Networks, 2016

    Paolo Di Lorenzo and Gesualdo Scutari. Next: In-network nonconvex optimization.IEEE Transactions on Signal and Information Processing over Networks, 2016

  25. [25]

    Stich, and Martin Jaggi

    Anastasia Koloskova, Sebastian U. Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication. InProceedings of the International Conference on Machine Learning, 2019

  26. [26]

    Unbiased compression saves communication in distributed optimization: When and how much? InAdvances in Neural Information Processing Systems, 2023

    Yutong He, Xinmeng Huang, and Kun Yuan. Unbiased compression saves communication in distributed optimization: When and how much? InAdvances in Neural Information Processing Systems, 2023

  27. [27]

    Greedy low-rank gradient com- pression for distributed learning with convergence guarantees.IEEE Transactions on Signal Processing, 2026

    Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, and Kun Yuan. Greedy low-rank gradient com- pression for distributed learning with convergence guarantees.IEEE Transactions on Signal Processing, 2026

  28. [28]

    On biased compression for distributed learning.Journal of Machine Learning Research, 2023

    Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, and Mher Safaryan. On biased compression for distributed learning.Journal of Machine Learning Research, 2023

  29. [29]

    Error compensated distributed SGD can be accelerated

    Xun Qian, Peter Richtárik, and Tong Zhang. Error compensated distributed SGD can be accelerated. InAdvances in Neural Information Processing Systems, 2021

  30. [30]

    Understanding the influence of digraphs on decentralized optimization: Effective metrics, lower bound, and optimal algorithm.SIAM Journal on Optimization, 2025

    Liyuan Liang, Xinmeng Huang, Ran Xin, and Kun Yuan. Understanding the influence of digraphs on decentralized optimization: Effective metrics, lower bound, and optimal algorithm.SIAM Journal on Optimization, 2025

  31. [31]

    Alghunaim

    Sulaiman A. Alghunaim. Local exact-diffusion for decentralized optimization and learning.IEEE Transactions on Automatic Control, 2024

  32. [32]

    Exponential graph is provably efficient for decentralized deep training

    Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, and Wotao Yin. Exponential graph is provably efficient for decentralized deep training. InAdvances in Neural Information Processing Systems, 2021. 22

  33. [33]

    Decentralized Riemannian gradient descent on the Stiefel manifold

    Shixiang Chen, Alfredo Garcia, Mingyi Hong, and Shahin Shahrampour. Decentralized Riemannian gradient descent on the Stiefel manifold. InInternational Conference on Machine Learning, pages 1594–1605. PMLR, 2021

  34. [34]

    Decentralized projected Riemannian gradient method for smooth opti- mization on compact submanifolds embedded in the euclidean space.Numerische Mathematik, 2025

    Kangkang Deng and Jiang Hu. Decentralized projected Riemannian gradient method for smooth opti- mization on compact submanifolds embedded in the euclidean space.Numerische Mathematik, 2025

  35. [35]

    A decentralized proximal gradient tracking algorithm for composite optimization on Riemannian manifolds.Journal of Machine Learning Research, 2025

    Lei Wang, Le Bao, and Xin Liu. A decentralized proximal gradient tracking algorithm for composite optimization on Riemannian manifolds.Journal of Machine Learning Research, 2025

  36. [36]

    Riemannian EXTRA: Communication-efficient decentralized optimization over compact submanifolds with data heterogeneity

    Jiayuan Wu, Zhanwang Deng, Jiang Hu, Weijie Su, and Zaiwen Wen. Riemannian EXTRA: Communication-efficient decentralized optimization over compact submanifolds with data heterogeneity. arXiv preprint arXiv:2505.15537, 2025

  37. [37]

    Decentralized optimization on compact submanifolds by quantized Riemannian gradient tracking.IEEE Transactions on Signal Processing, 2025

    JunChen, LinaLiu, TianyiZhu, YongLiu, GuangDai, YunliangJiang, andIvorWTsang. Decentralized optimization on compact submanifolds by quantized Riemannian gradient tracking.IEEE Transactions on Signal Processing, 2025

  38. [38]

    Improving the communication in decentralized manifold optimization through single-step consensus and compression.arXiv preprint arXiv:2407.08904, 2024

    Jiang Hu and Kangkang Deng. Improving the communication in decentralized manifold optimization through single-step consensus and compression.arXiv preprint arXiv:2407.08904, 2024

  39. [39]

    Decentralized projected Riemannian stochastic recursive momentum method for nonconvex optimization

    Kangkang Deng and Jiang Hu. Decentralized projected Riemannian stochastic recursive momentum method for nonconvex optimization. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

  40. [40]

    Tsang, and Yong Liu

    Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, and Yong Liu. Decentralized Riemannian conjugate gradient method on the Stiefel manifold.arXiv preprint arXiv:2308.10547, 2023

  41. [41]

    Decentralized Riemannian natural gradient methods with Kronecker product approximations.Journal of the Operations Research Society of China, 2025

    Jiang Hu, Kangkang Deng, and Quanzheng Li. Decentralized Riemannian natural gradient methods with Kronecker product approximations.Journal of the Operations Research Society of China, 2025

  42. [42]

    On the local linear rate of consensus on the Stiefel manifold.IEEE Transactions on Automatic Control, 2023

    Shixiang Chen, Alfredo Garcia, Mingyi Hong, and Shahin Shahrampour. On the local linear rate of consensus on the Stiefel manifold.IEEE Transactions on Automatic Control, 2023

  43. [43]

    Riemannian consensus for manifolds with bounded curva- ture.IEEE Transactions on Automatic Control, 2012

    Roberto Tron, Bijan Afsari, and René Vidal. Riemannian consensus for manifolds with bounded curva- ture.IEEE Transactions on Automatic Control, 2012

  44. [44]

    Consensus optimization on manifolds.SIAM Journal on Control and Optimization, 2009

    Alain Sarlette and Rodolphe Sepulchre. Consensus optimization on manifolds.SIAM Journal on Control and Optimization, 2009

  45. [45]

    Achieving local consensus over compact submanifolds

    Jiang Hu, Jiaojiao Zhang, and Kangkang Deng. Achieving local consensus over compact submanifolds. IEEE Transactions on Automatic Control, 70(9):5750–5763, 2025. doi: 10.1109/TAC.2025.3545711

  46. [46]

    Retraction-free decentralized non-convex optimization with orthogonal constraints.arXiv preprint arXiv:2405.11590, 2024

    Youbang Sun, Shixiang Chen, Alfredo Garcia, and Shahin Shahrampour. Retraction-free decentralized non-convex optimization with orthogonal constraints.arXiv preprint arXiv:2405.11590, 2024. doi: 10.48550/arXiv.2405.11590

  47. [47]

    Decentralized optimization over the Stiefel manifold by an approximate aug- mented Lagrangian function.IEEE Transactions on Signal Processing, 2022

    Lei Wang and Xin Liu. Decentralized optimization over the Stiefel manifold by an approximate aug- mented Lagrangian function.IEEE Transactions on Signal Processing, 2022

  48. [48]

    Fast and accurate optimization on the orthogonal manifold without retraction

    Pierre Ablin and Gabriel Peyré. Fast and accurate optimization on the orthogonal manifold without retraction. InProceedings of The 25th International Conference on Artificial Intelligence and Statistics, pages 5636–5657. PMLR, 2022

  49. [49]

    Parallelizable algorithms for optimization problems with orthog- onality constraints.SIAM Journal on Scientific Computing, 41(3):A1949–A1983, 2019

    Bin Gao, Xin Liu, and Ya-xiang Yuan. Parallelizable algorithms for optimization problems with orthog- onality constraints.SIAM Journal on Scientific Computing, 41(3):A1949–A1983, 2019

  50. [50]

    Dissolving constraints for Riemannian optimization

    Nachuan Xiao, Xin Liu, and Kim-Chuan Toh. Dissolving constraints for Riemannian optimization. Mathematics of Operations Research, 2024. 23

  51. [51]

    Horn and Charles R

    Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, 2012

  52. [52]

    Convergence analysis of EXTRA in non-convex distributed optimization.IEEE Control Systems Letters, 2025

    Lei Qin and Ye Pu. Convergence analysis of EXTRA in non-convex distributed optimization.IEEE Control Systems Letters, 2025

  53. [53]

    Reasflow: Assisting reasoning-centric scientific discovery in applied mathematics via a knowledge-based multi-agent system, 2026

    ReasFlow Team. Reasflow: Assisting reasoning-centric scientific discovery in applied mathematics via a knowledge-based multi-agent system, 2026. URLhttps://blog.reaslab.io/blog/reasflow-intro/. 24