Learning to accelerate distributed ADMM using graph neural networks

Daniel Hern\'andez Escobar; Henri Doerks; Jens Sj\"olund; Paul H\"ausner

arxiv: 2509.05288 · v2 · submitted 2025-09-05 · 💻 cs.LG · math.OC

Learning to accelerate distributed ADMM using graph neural networks

Henri Doerks , Paul H\"ausner , Daniel Hern\'andez Escobar , Jens Sj\"olund This is my paper

Pith reviewed 2026-05-18 18:43 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords distributed optimizationADMMgraph neural networkshyperparameter learningconvergence accelerationmessage passingunrolled optimizationdecentralized algorithms

0 comments

The pith

Distributed ADMM iterations fit inside graph neural network message passing so a network can learn adaptive step sizes and weights for faster convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the alternating direction method of multipliers can be rewritten as message passing on a graph, allowing a GNN to predict iteration-specific step sizes and communication weights directly from the current iterates. By unrolling a fixed number of ADMM steps and training the GNN end-to-end to reduce the distance to the solution on a given problem class, the method keeps the original convergence guarantees while improving practical speed. A sympathetic reader would care because distributed optimization appears in large-scale machine learning and control, where slow or hyperparameter-sensitive convergence is a recurring cost. The learned variant is shown to deliver both quicker progress within the training budget and better final accuracy on new instances drawn from the same distribution.

Core claim

Distributed ADMM iterations can be expressed within the message-passing framework of graph neural networks. A GNN is then trained to output adaptive step sizes and communication weights from the current iterates; the combined system is unrolled for a fixed number of steps and optimized end-to-end to minimize solution error after those steps, while the underlying ADMM updates preserve their theoretical convergence properties.

What carries the argument

The rewriting of ADMM's local minimization and dual update steps as graph message-passing layers that let the GNN emit per-iteration hyperparameters.

If this is right

Convergence speed improves both inside and outside the exact iteration count used during training.
Solution quality after a fixed computational budget is higher than with hand-chosen ADMM parameters.
The method retains ADMM convergence guarantees because the learned values are inserted into the original update rules.
Generalization holds for new instances sampled from the same problem distribution used in training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same GNN embedding idea could be applied to other first-order distributed methods that admit a graph interpretation.
In deployed systems the learned predictor might replace manual grid search over step sizes and penalty parameters.
Robustness can be tested by feeding the trained network instances drawn from a modestly shifted distribution.
Hybrid schemes become possible in which the GNN supplies guidance only for the first many iterations and then hands control back to standard ADMM.

Load-bearing premise

A GNN trained on a specific problem class will output hyperparameters that keep the algorithm convergent and improve performance on new instances from the same distribution.

What would settle it

Collect a fresh set of problem instances drawn from the training distribution, run both standard ADMM and the learned variant for the same number of iterations, and check whether the learned version consistently reaches a target accuracy in fewer iterations or with lower final error.

Figures

Figures reproduced from arXiv: 2509.05288 by Daniel Hern\'andez Escobar, Henri Doerks, Jens Sj\"olund, Paul H\"ausner.

**Figure 1.** Figure 1: Computation structure of iteration k of the decentralized distributed ADMM algorithm computed as two message-passing steps in our proposed GNN (see Algorithm 2.2). The initial node features V k contain the previous ADMM iterates (x k i , yk i , λk i ) for each node i and the nonzero edge features E describe the weighted pairwise connection of nodes in the network which is fixed across iterations. After tw… view at source ↗

**Figure 2.** Figure 2: Semi-log plots of the relative objective for the two problem classes across all 20 iterations. The network is only unrolled for the first 10 iterations during training as in the previous results. One exception is the learned communication matrix for the least-squares problem which leads to a higher objective value for most of the iterations even though the distance to the minimizer gets reduced as we have … view at source ↗

read the original abstract

Distributed optimization is fundamental to large-scale machine learning and control applications. Among existing methods, the alternating direction method of multipliers (ADMM) has gained popularity due to its strong convergence guarantees and suitability for decentralized computation. However, ADMM can suffer from slow convergence and high sensitivity to hyperparameter choices. In this work, we show that distributed ADMM iterations can be naturally expressed within the message-passing framework of graph neural networks (GNNs). Building on this connection, we propose learning adaptive step sizes and communication weights through a GNN that predicts these yperparameters based on the current iterates. By unrolling ADMM for a fixed number of iterations, we train the network end-to-end to minimize the solution distance after these iterations for a given problem class, while preserving the algorithm's convergence properties. Numerical experiments demonstrate that our learned variant consistently improves convergence speed and solution quality compared to standard ADMM, both within the trained computational budget and beyond. The code is available at https://github.com/paulhausner/learning-distributed-admm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps distributed ADMM to GNN message passing and learns adaptive step sizes plus weights end-to-end, with experiments showing faster convergence on the tested problems.

read the letter

The core move here is expressing the distributed ADMM updates as graph message passing so a GNN can output both the penalty parameters and the communication weights from the current iterates. That framing is new enough to be worth noting, and unrolling a fixed number of steps for end-to-end training follows the usual learned-optimizer pattern but applies it cleanly to the distributed case. The authors also release the code, which makes the claims easier to check.

Referee Report

2 major / 2 minor

Summary. The paper claims that distributed ADMM iterations can be expressed as message-passing operations on graphs, allowing a GNN to learn adaptive step sizes and communication weights from current iterates. The network is trained end-to-end by unrolling a fixed number of ADMM steps and minimizing the distance to the optimum for a given problem class, while aiming to preserve convergence properties. Numerical experiments are reported to show consistent improvements in convergence speed and solution quality over standard ADMM, both inside and outside the training horizon.

Significance. If the learned parameters reliably accelerate ADMM without introducing instability, the work would offer a practical bridge between classical distributed optimization and learned accelerators for specific problem distributions. The explicit code release supports reproducibility and external verification of the empirical claims.

major comments (2)

[Abstract and §5] Abstract and §5 (Numerical Experiments): the central claim that improvements hold 'both within the trained computational budget and beyond' rests on fixed-horizon unrolling that minimizes distance only after K steps. No analysis, additional long-horizon experiments, or stability checks are provided to confirm that GNN-predicted state-dependent parameters maintain convergence (e.g., satisfying the standard ADMM conditions on ρ) or prevent oscillation/divergence on fresh instances drawn from the same distribution after the training horizon.
[§4] §4 (Method): while the message-passing reformulation of ADMM is clean, the paper does not derive or verify that the learned, input-dependent weights and step sizes continue to satisfy the convergence assumptions of classical ADMM once the GNN is evaluated outside the exact training distribution or for iteration counts >K.

minor comments (2)

[Abstract] Abstract: 'yperparameters' appears to be a typo for 'hyperparameters'.
[§5] The description of the exact problem distributions used for training versus testing, and any hyperparameter sensitivity analysis, would benefit from additional detail to support the generalization claims.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. Below we provide point-by-point responses to the major comments. We will make revisions to address the concerns about empirical validation and clarification of theoretical aspects.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Numerical Experiments): the central claim that improvements hold 'both within the trained computational budget and beyond' rests on fixed-horizon unrolling that minimizes distance only after K steps. No analysis, additional long-horizon experiments, or stability checks are provided to confirm that GNN-predicted state-dependent parameters maintain convergence (e.g., satisfying the standard ADMM conditions on ρ) or prevent oscillation/divergence on fresh instances drawn from the same distribution after the training horizon.

Authors: We agree that the training procedure uses fixed-horizon unrolling. In the experiments of §5, we do evaluate the learned ADMM for a larger number of iterations than K on unseen problem instances and report sustained improvements without observed divergence. However, we acknowledge the absence of dedicated long-horizon analysis or formal stability checks. We will revise the abstract and §5 to emphasize that the 'beyond' claim is based on empirical observation up to a moderate extension of the horizon, and add additional plots showing performance for up to 2K or 3K iterations with discussion of stability. A complete theoretical analysis of convergence for the learned parameters remains an open question and is noted as future work. revision: partial
Referee: [§4] §4 (Method): while the message-passing reformulation of ADMM is clean, the paper does not derive or verify that the learned, input-dependent weights and step sizes continue to satisfy the convergence assumptions of classical ADMM once the GNN is evaluated outside the exact training distribution or for iteration counts >K.

Authors: The reformulation shows that each ADMM iteration corresponds to a message-passing step where the GNN predicts the step sizes and weights based on local iterates. During training, we constrain the outputs to be positive to mimic the standard ADMM parameter ρ > 0. Nevertheless, we do not derive that these state-dependent predictions necessarily fulfill all classical convergence conditions for every possible input or for arbitrarily large iteration counts. We will update §4 to include a clearer statement that while the structure preserves the ADMM form, the convergence guarantees are inherited only heuristically and are supported by the empirical results rather than proven for the learned case. This limitation will be explicitly discussed. revision: yes

standing simulated objections not resolved

A formal derivation verifying that the GNN-predicted parameters satisfy ADMM convergence assumptions outside the training distribution and for iteration counts exceeding K.

Circularity Check

0 steps flagged

No circularity: training objective independent of performance claims

full rationale

The paper connects distributed ADMM iterations to GNN message passing as a structural observation, then defines a GNN that outputs state-dependent step sizes and weights. These are trained by unrolling a fixed horizon K and minimizing distance to the optimum after K steps. This loss is independent of the reported long-term convergence improvements and does not reduce the empirical results to the inputs by construction. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the outcome; the central claims rest on numerical experiments for a given problem class, with released code enabling external checks. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability of the learned GNN to generalize across problem instances and on the assumption that unrolling a fixed number of iterations is sufficient to capture long-term convergence behavior. No new physical entities are introduced.

free parameters (1)

GNN weights
The parameters of the graph neural network are fitted during end-to-end training to minimize solution distance after a fixed number of ADMM iterations.

axioms (1)

domain assumption Distributed ADMM iterations can be exactly expressed as message-passing operations on a graph.
This equivalence is the foundation for applying GNNs; it is stated in the abstract and forms the modeling choice.

pith-pipeline@v0.9.0 · 5721 in / 1333 out tokens · 25594 ms · 2026-05-18T18:43:17.198370+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show the one-to-one correspondence of distributed ADMM iterations and message-passing networks... learn adaptive step sizes and communication weights through a GNN
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By unrolling ADMM for a fixed number of iterations, we train the network end-to-end to minimize the solution distance after these iterations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Tutorial on amortized optimization.Foundations and Trends in Machine Learning, 16(5):592–732, 2023

Brandon Amos. Tutorial on amortized optimization.Foundations and Trends in Machine Learning, 16(5):592–732, 2023. 2, 7

work page 2023
[2]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017. 6

work page 2017
[3]

Learning to learn by gradient descent by gradient descent.Advances in neural information processing systems, 29, 2016

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent.Advances in neural information processing systems, 29, 2016. 2

work page 2016
[4]

Principled acceleration of iterative numerical methods using machine learning

Sohei Arisaka and Qianxiao Li. Principled acceleration of iterative numerical methods using machine learning. InInternational Conference on Machine Learning, pages 1041–1059. PMLR,

work page
[5]

Accelerated forward- backward optimization using deep learning.SIAM Journal on Optimization, 34(2):1236–1263,

Sebastian Banert, Jevgenija Rudzusika, Ozan Öktem, and Jonas Adler. Accelerated forward- backward optimization using deep learning.SIAM Journal on Optimization, 34(2):1236–1263,

work page
[6]

Rina Foygel Barber and Emil Y . Sidky. Convergence for nonconvex admm, with applications to ct imaging.Journal of Machine Learning Research, 25(38):1–46, 2024. 4

work page 2024
[7]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv preprint arXiv:1806.01261, 2018. 1, 2, 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011. ISSN 1935-8237. 1, 2, 4

work page 2011
[9]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax. 6, 15

work page 2018
[10]

A simple yet effective baseline for non-attributed graph classification

Chen Cai and Yusu Wang. A simple yet effective baseline for non-attributed graph classification. arXiv preprint arXiv:1811.03508, 2018. 6, 15

work page arXiv 2018
[11]

Greedy low-rank gra- dient compression for distributed learning with convergence guarantees.arXiv preprint arXiv:2507.08784, 2025

Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, and Kun Yuan. Greedy low-rank gra- dient compression for distributed learning with convergence guarantees.arXiv preprint arXiv:2507.08784, 2025. 9

work page arXiv 2025
[12]

Learning to optimize: A primer and a benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. Learning to optimize: A primer and a benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022. 1, 2, 6

work page 2022
[13]

Expressive power of graph neural networks for (mixed-integer) quadratic programs.arXiv preprint arXiv:2406.05938,

Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, and Wotao Yin. Expressive power of graph neural networks for (mixed-integer) quadratic programs.arXiv preprint arXiv:2406.05938,

work page arXiv
[14]

3 10 Learning to accelerate distributed ADMM using graph neural networks

work page
[15]

Privacy-preserving distributed optimization and learning

Ziqin Chen and Yongqiang Wang. Privacy-preserving distributed optimization and learning. arXiv preprint arXiv:2403.00157, 2024. 3

work page arXiv 2024
[16]

Q-shed: Distributed optimization at the edge via hessian eigenvectors quantization

Nicolò Dal Fabbro, Michele Rossi, Luca Schenato, and Subhrakanti Dey. Q-shed: Distributed optimization at the edge via hessian eigenvectors quantization. InICC 2023-IEEE International Conference on Communications, pages 4403–4408. IEEE, 2023. 3

work page 2023
[17]

Bertsekas

Jonathan Eckstein and Dimitri P. Bertsekas. An alternating direction method for linear program- ming. Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

work page
[18]

On random graphs i.Publicationes Mathematicae Debrecen, 6:290–297,

P Erdös and A Rényi. On random graphs i.Publicationes Mathematicae Debrecen, 6:290–297,

work page
[19]

Learning preconditioners for inverse problems

Patrick Fahy, Mohammad Golbabaee, and Matthias J Ehrhardt. Greedy learning to optimize with convergence guarantees.arXiv preprint arXiv:2406.00260, 2024. 3

work page arXiv 2024
[20]

A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Computers & mathematics with applications (1987), 2(1):17–40, 1976

Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Computers & mathematics with applications (1987), 2(1):17–40, 1976. ISSN 0898-1221. 1, 4

work page 1987
[21]

Gradient methods with online scaling part i

Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, and Madeleine Udell. Gradient methods with online scaling part i. theoretical foundations.arXiv preprint arXiv:2505.23081, 2025. 3

work page arXiv 2025
[22]

Neural message passing for quantum chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational conference on machine learning, pages 1263–1272. PMLR, 2017. 2, 4

work page 2017
[23]

Glowinski and A

R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires.Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2): 41–76, 1975. ISSN 0397-9342. 1, 4

work page 1975
[24]

Jraph: A library for graph neural networks in jax., 2020

Jonathan Godwin*, Thomas Keck*, Peter Battaglia, Victor Bapst, Thomas Kipf, Yujia Li, Kimberly Stachenfeld, Petar Veliˇckovi´c, and Alvaro Sanchez-Gonzalez. Jraph: A library for graph neural networks in jax., 2020. URLhttp://github.com/deepmind/jraph. 15

work page 2020
[25]

Neural incomplete factorization: learning preconditioners for the conjugate gradient method.Transactions on Machine Learning Research,

Paul Häusner, Ozan Öktem, and Jens Sjölund. Neural incomplete factorization: learning preconditioners for the conjugate gradient method.Transactions on Machine Learning Research,

work page
[26]

Learning incomplete factorization preconditioners for GMRES

Paul Häusner, Aleix Nieto Juscafresa, and Jens Sjölund. Learning incomplete factorization preconditioners for GMRES. InProceedings of the 6th Northern Lights Deep Learning Confer- ence (NLDL), volume 265 ofProceedings of Machine Learning Research, pages 85–99. PMLR,

work page
[27]

A mathematics- inspired learning-to-optimize framework for decentralized optimization.arXiv preprint arXiv:2410.01700, 2024

Yutong He, Qiulin Shang, Xinmeng Huang, Jialin Liu, and Kun Yuan. A mathematics- inspired learning-to-optimize framework for decentralized optimization.arXiv preprint arXiv:2410.01700, 2024. 3

work page arXiv 2024
[28]

Flax: A neural network library and ecosystem for JAX, 2024

Jonathan Heek, Anselm Levskaya, Avital Oliver, Marvin Ritter, Bertrand Rondepierre, Andreas Steiner, and Marc van Zee. Flax: A neural network library and ecosystem for JAX, 2024. URL http://github.com/google/flax. 15

work page 2024
[29]

Accelerating quadratic optimization with reinforcement learning

Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E Gonzalez, Ion Stoica, and Ken Goldberg. Accelerating quadratic optimization with reinforcement learning. InAdvances in Neural Information Processing Systems, volume 34, pages 21043–21055, 2021. 2

work page 2021
[30]

Convergence rate of distributed ADMM over networks

Ali Makhdoumi and Asuman Ozdaglar. Convergence rate of distributed ADMM over networks. IEEE transactions on automatic control, 62(10):5082–5095, 2017. ISSN 0018-9286. 2, 4, 9, 16

work page 2017
[31]

Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44,

Vishal Monga, Yuelong Li, and Yonina C Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44,

work page
[32]

Distributed learn-to-optimize: Limited communications opti- mization over networks via deep unfolded distributed ADMM.IEEE Transactions on Mobile Computing, 2024

Yoav Noah and Nir Shlezinger. Distributed learn-to-optimize: Limited communications opti- mization over networks via deep unfolded distributed ADMM.IEEE Transactions on Mobile Computing, 2024. 3 11 Learning to accelerate distributed ADMM using graph neural networks

work page 2024
[33]

Exploring the power of graph neural networks in solving linear optimization problems

Chendi Qian, Didier Chételat, and Christopher Morris. Exploring the power of graph neural networks in solving linear optimization problems. InInternational Conference on Artificial Intelligence and Statistics, pages 1432–1440. PMLR, 2024. 3

work page 2024
[34]

Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024

Rajiv Sambharya and Bartolomeo Stellato. Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024. 2

work page arXiv 2024
[35]

Learning to warm- start fixed-point optimization algorithms.Journal of Machine Learning Research, 25(166): 1–46, 2024

Rajiv Sambharya, Georgina Hall, Brandon Amos, and Bartolomeo Stellato. Learning to warm- start fixed-point optimization algorithms.Journal of Machine Learning Research, 25(166): 1–46, 2024. 2

work page 2024
[36]

Deep distributed optimization for large-scale quadratic program- ming.arXiv preprint arXiv:2412.12156, 2024

Augustinos D Saravanos, Hunter Kuperman, Alex Oshin, Arshiya Taj Abdul, Vincent Pacelli, and Evangelos A Theodorou. Deep distributed optimization for large-scale quadratic program- ming.arXiv preprint arXiv:2412.12156, 2024. 2, 9

work page arXiv 2024
[37]

The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008. 1

work page 2008
[38]

Graph-based neural acceleration for nonnegative matrix factorization, 2022

Jens Sjölund and Maria Bånkestad. Graph-based neural acceleration for nonnegative matrix factorization, 2022. 3

work page 2022
[39]

The admm algorithm for distributed quadratic problems: Parameter selection and constraint preconditioning.IEEE Transactions on Signal Processing, 64(2):290–305, 2015

Andre Teixeira, Euhanna Ghadimi, Iman Shames, Henrik Sandberg, and Mikael Johansson. The admm algorithm for distributed quadratic problems: Parameter selection and constraint preconditioning.IEEE Transactions on Signal Processing, 64(2):290–305, 2015. 3

work page 2015
[40]

Instance Normalization: The Missing Ingredient for Fast Stylization

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization.arXiv preprint arXiv:1607.08022, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016
[41]

Neural algorithmic reasoning.Patterns, 2(7), 2021

Petar Veli ˇckovi´c and Charles Blundell. Neural algorithmic reasoning.Patterns, 2(7), 2021. 2

work page 2021
[42]

Distributed least squares solver for network linear equations.Automatica, 113:108798, 2020

Tao Yang, Jemin George, Jiahu Qin, Xinlei Yi, and Junfeng Wu. Distributed least squares solver for network linear equations.Automatica, 113:108798, 2020. 7

work page 2020
[43]

Accelerating admm for efficient simulation and optimization.ACM Transactions on Graphics (TOG), 38(6):1–21, 2019

Juyong Zhang, Yue Peng, Wenqing Ouyang, and Bailin Deng. Accelerating admm for efficient simulation and optimization.ACM Transactions on Graphics (TOG), 38(6):1–21, 2019. 1

work page 2019
[44]

Asynchronous distributed admm for consensus optimization

Ruiliang Zhang and James Kwok. Asynchronous distributed admm for consensus optimization. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32 ofProceedings of Machine Learning Research, pages 1701–1709, Bejing, China, 22–24 Jun 2014. PMLR. 7 12 Learning to accelerate distributed ADMM usin...

work page 2014

[1] [1]

Tutorial on amortized optimization.Foundations and Trends in Machine Learning, 16(5):592–732, 2023

Brandon Amos. Tutorial on amortized optimization.Foundations and Trends in Machine Learning, 16(5):592–732, 2023. 2, 7

work page 2023

[2] [2]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017. 6

work page 2017

[3] [3]

Learning to learn by gradient descent by gradient descent.Advances in neural information processing systems, 29, 2016

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent.Advances in neural information processing systems, 29, 2016. 2

work page 2016

[4] [4]

Principled acceleration of iterative numerical methods using machine learning

Sohei Arisaka and Qianxiao Li. Principled acceleration of iterative numerical methods using machine learning. InInternational Conference on Machine Learning, pages 1041–1059. PMLR,

work page

[5] [5]

Accelerated forward- backward optimization using deep learning.SIAM Journal on Optimization, 34(2):1236–1263,

Sebastian Banert, Jevgenija Rudzusika, Ozan Öktem, and Jonas Adler. Accelerated forward- backward optimization using deep learning.SIAM Journal on Optimization, 34(2):1236–1263,

work page

[6] [6]

Rina Foygel Barber and Emil Y . Sidky. Convergence for nonconvex admm, with applications to ct imaging.Journal of Machine Learning Research, 25(38):1–46, 2024. 4

work page 2024

[7] [7]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv preprint arXiv:1806.01261, 2018. 1, 2, 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[8] [8]

Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011. ISSN 1935-8237. 1, 2, 4

work page 2011

[9] [9]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax. 6, 15

work page 2018

[10] [10]

A simple yet effective baseline for non-attributed graph classification

Chen Cai and Yusu Wang. A simple yet effective baseline for non-attributed graph classification. arXiv preprint arXiv:1811.03508, 2018. 6, 15

work page arXiv 2018

[11] [11]

Greedy low-rank gra- dient compression for distributed learning with convergence guarantees.arXiv preprint arXiv:2507.08784, 2025

Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, and Kun Yuan. Greedy low-rank gra- dient compression for distributed learning with convergence guarantees.arXiv preprint arXiv:2507.08784, 2025. 9

work page arXiv 2025

[12] [12]

Learning to optimize: A primer and a benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. Learning to optimize: A primer and a benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022. 1, 2, 6

work page 2022

[13] [13]

Expressive power of graph neural networks for (mixed-integer) quadratic programs.arXiv preprint arXiv:2406.05938,

Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, and Wotao Yin. Expressive power of graph neural networks for (mixed-integer) quadratic programs.arXiv preprint arXiv:2406.05938,

work page arXiv

[14] [14]

3 10 Learning to accelerate distributed ADMM using graph neural networks

work page

[15] [15]

Privacy-preserving distributed optimization and learning

Ziqin Chen and Yongqiang Wang. Privacy-preserving distributed optimization and learning. arXiv preprint arXiv:2403.00157, 2024. 3

work page arXiv 2024

[16] [16]

Q-shed: Distributed optimization at the edge via hessian eigenvectors quantization

Nicolò Dal Fabbro, Michele Rossi, Luca Schenato, and Subhrakanti Dey. Q-shed: Distributed optimization at the edge via hessian eigenvectors quantization. InICC 2023-IEEE International Conference on Communications, pages 4403–4408. IEEE, 2023. 3

work page 2023

[17] [17]

Bertsekas

Jonathan Eckstein and Dimitri P. Bertsekas. An alternating direction method for linear program- ming. Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

work page

[18] [18]

On random graphs i.Publicationes Mathematicae Debrecen, 6:290–297,

P Erdös and A Rényi. On random graphs i.Publicationes Mathematicae Debrecen, 6:290–297,

work page

[19] [19]

Learning preconditioners for inverse problems

Patrick Fahy, Mohammad Golbabaee, and Matthias J Ehrhardt. Greedy learning to optimize with convergence guarantees.arXiv preprint arXiv:2406.00260, 2024. 3

work page arXiv 2024

[20] [20]

A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Computers & mathematics with applications (1987), 2(1):17–40, 1976

Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Computers & mathematics with applications (1987), 2(1):17–40, 1976. ISSN 0898-1221. 1, 4

work page 1987

[21] [21]

Gradient methods with online scaling part i

Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, and Madeleine Udell. Gradient methods with online scaling part i. theoretical foundations.arXiv preprint arXiv:2505.23081, 2025. 3

work page arXiv 2025

[22] [22]

Neural message passing for quantum chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational conference on machine learning, pages 1263–1272. PMLR, 2017. 2, 4

work page 2017

[23] [23]

Glowinski and A

R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires.Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2): 41–76, 1975. ISSN 0397-9342. 1, 4

work page 1975

[24] [24]

Jraph: A library for graph neural networks in jax., 2020

Jonathan Godwin*, Thomas Keck*, Peter Battaglia, Victor Bapst, Thomas Kipf, Yujia Li, Kimberly Stachenfeld, Petar Veliˇckovi´c, and Alvaro Sanchez-Gonzalez. Jraph: A library for graph neural networks in jax., 2020. URLhttp://github.com/deepmind/jraph. 15

work page 2020

[25] [25]

Neural incomplete factorization: learning preconditioners for the conjugate gradient method.Transactions on Machine Learning Research,

Paul Häusner, Ozan Öktem, and Jens Sjölund. Neural incomplete factorization: learning preconditioners for the conjugate gradient method.Transactions on Machine Learning Research,

work page

[26] [26]

Learning incomplete factorization preconditioners for GMRES

Paul Häusner, Aleix Nieto Juscafresa, and Jens Sjölund. Learning incomplete factorization preconditioners for GMRES. InProceedings of the 6th Northern Lights Deep Learning Confer- ence (NLDL), volume 265 ofProceedings of Machine Learning Research, pages 85–99. PMLR,

work page

[27] [27]

A mathematics- inspired learning-to-optimize framework for decentralized optimization.arXiv preprint arXiv:2410.01700, 2024

Yutong He, Qiulin Shang, Xinmeng Huang, Jialin Liu, and Kun Yuan. A mathematics- inspired learning-to-optimize framework for decentralized optimization.arXiv preprint arXiv:2410.01700, 2024. 3

work page arXiv 2024

[28] [28]

Flax: A neural network library and ecosystem for JAX, 2024

Jonathan Heek, Anselm Levskaya, Avital Oliver, Marvin Ritter, Bertrand Rondepierre, Andreas Steiner, and Marc van Zee. Flax: A neural network library and ecosystem for JAX, 2024. URL http://github.com/google/flax. 15

work page 2024

[29] [29]

Accelerating quadratic optimization with reinforcement learning

Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E Gonzalez, Ion Stoica, and Ken Goldberg. Accelerating quadratic optimization with reinforcement learning. InAdvances in Neural Information Processing Systems, volume 34, pages 21043–21055, 2021. 2

work page 2021

[30] [30]

Convergence rate of distributed ADMM over networks

Ali Makhdoumi and Asuman Ozdaglar. Convergence rate of distributed ADMM over networks. IEEE transactions on automatic control, 62(10):5082–5095, 2017. ISSN 0018-9286. 2, 4, 9, 16

work page 2017

[31] [31]

Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44,

Vishal Monga, Yuelong Li, and Yonina C Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44,

work page

[32] [32]

Distributed learn-to-optimize: Limited communications opti- mization over networks via deep unfolded distributed ADMM.IEEE Transactions on Mobile Computing, 2024

Yoav Noah and Nir Shlezinger. Distributed learn-to-optimize: Limited communications opti- mization over networks via deep unfolded distributed ADMM.IEEE Transactions on Mobile Computing, 2024. 3 11 Learning to accelerate distributed ADMM using graph neural networks

work page 2024

[33] [33]

Exploring the power of graph neural networks in solving linear optimization problems

Chendi Qian, Didier Chételat, and Christopher Morris. Exploring the power of graph neural networks in solving linear optimization problems. InInternational Conference on Artificial Intelligence and Statistics, pages 1432–1440. PMLR, 2024. 3

work page 2024

[34] [34]

Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024

Rajiv Sambharya and Bartolomeo Stellato. Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024. 2

work page arXiv 2024

[35] [35]

Learning to warm- start fixed-point optimization algorithms.Journal of Machine Learning Research, 25(166): 1–46, 2024

Rajiv Sambharya, Georgina Hall, Brandon Amos, and Bartolomeo Stellato. Learning to warm- start fixed-point optimization algorithms.Journal of Machine Learning Research, 25(166): 1–46, 2024. 2

work page 2024

[36] [36]

Deep distributed optimization for large-scale quadratic program- ming.arXiv preprint arXiv:2412.12156, 2024

Augustinos D Saravanos, Hunter Kuperman, Alex Oshin, Arshiya Taj Abdul, Vincent Pacelli, and Evangelos A Theodorou. Deep distributed optimization for large-scale quadratic program- ming.arXiv preprint arXiv:2412.12156, 2024. 2, 9

work page arXiv 2024

[37] [37]

The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008. 1

work page 2008

[38] [38]

Graph-based neural acceleration for nonnegative matrix factorization, 2022

Jens Sjölund and Maria Bånkestad. Graph-based neural acceleration for nonnegative matrix factorization, 2022. 3

work page 2022

[39] [39]

The admm algorithm for distributed quadratic problems: Parameter selection and constraint preconditioning.IEEE Transactions on Signal Processing, 64(2):290–305, 2015

Andre Teixeira, Euhanna Ghadimi, Iman Shames, Henrik Sandberg, and Mikael Johansson. The admm algorithm for distributed quadratic problems: Parameter selection and constraint preconditioning.IEEE Transactions on Signal Processing, 64(2):290–305, 2015. 3

work page 2015

[40] [40]

Instance Normalization: The Missing Ingredient for Fast Stylization

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization.arXiv preprint arXiv:1607.08022, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016

[41] [41]

Neural algorithmic reasoning.Patterns, 2(7), 2021

Petar Veli ˇckovi´c and Charles Blundell. Neural algorithmic reasoning.Patterns, 2(7), 2021. 2

work page 2021

[42] [42]

Distributed least squares solver for network linear equations.Automatica, 113:108798, 2020

Tao Yang, Jemin George, Jiahu Qin, Xinlei Yi, and Junfeng Wu. Distributed least squares solver for network linear equations.Automatica, 113:108798, 2020. 7

work page 2020

[43] [43]

Accelerating admm for efficient simulation and optimization.ACM Transactions on Graphics (TOG), 38(6):1–21, 2019

Juyong Zhang, Yue Peng, Wenqing Ouyang, and Bailin Deng. Accelerating admm for efficient simulation and optimization.ACM Transactions on Graphics (TOG), 38(6):1–21, 2019. 1

work page 2019

[44] [44]

Asynchronous distributed admm for consensus optimization

Ruiliang Zhang and James Kwok. Asynchronous distributed admm for consensus optimization. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32 ofProceedings of Machine Learning Research, pages 1701–1709, Bejing, China, 22–24 Jun 2014. PMLR. 7 12 Learning to accelerate distributed ADMM usin...

work page 2014