LEAF: A Learning-Enabled ADMM Framework for Accelerated Convex Optimization

Binh Nguyen; Trinh Tran; Truong X. Nghiem

arxiv: 2606.08993 · v1 · pith:GZHDMUW7new · submitted 2026-06-08 · 💻 cs.LG · cs.SY· eess.SY· math.OC

LEAF: A Learning-Enabled ADMM Framework for Accelerated Convex Optimization

Binh Nguyen , Trinh Tran , Truong X. Nghiem This is my paper

Pith reviewed 2026-06-27 17:35 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SYmath.OC

keywords ADMMconvex optimizationinput convex neural networksMoreau envelopelearning-enabled optimizationaccelerationconvergence analysis

0 comments

The pith

Approximating the Moreau envelope with an input convex neural network accelerates ADMM while preserving convergence rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LEAF, a framework that learns an approximation to the Moreau envelope of a convex objective using an input convex neural network. This scalar-valued model is inserted directly into the ADMM updates to produce the MEL-ADMM and sMEL-ADMM variants. Because the network is trained to respect convexity and smoothness, the resulting iterations retain the structural properties needed for standard convergence analysis. The approach targets a wide range of convex problems that mix smooth and nonsmooth terms and reports lower per-iteration cost than classical ADMM or existing operator-learning methods.

Core claim

LEAF approximates the Moreau envelope of the objective with an input convex neural network, yielding a learned scalar model that preserves convexity and smoothness. Substituting this model into the ADMM iteration produces MEL-ADMM and its splitting form sMEL-ADMM, both equipped with convergence and feasibility guarantees under the learned operator. The framework applies to convex problems with smooth or nonsmooth objectives and achieves rates comparable to classical ADMM at reduced per-iteration cost.

What carries the argument

The ICNN approximation of the Moreau envelope, a scalar-valued convex function learned from data and substituted into the ADMM proximal step to lower computational cost per iteration.

If this is right

MEL-ADMM and sMEL-ADMM achieve convergence rates comparable to classical ADMM.
Per-iteration computational cost is lower than that of standard ADMM or direct operator-learning baselines.
The methods remain feasible when the learned model replaces the exact Moreau envelope.
Numerical tests show up to an order-of-magnitude reduction in runtime while keeping optimality gaps low across a range of convex problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The scalar form of the learned envelope may allow a single trained network to be reused across problem instances that share the same objective structure but differ in dimension or data.
If the same ICNN construction can be applied to other first-order methods that rely on proximal mappings, the framework could reduce tuning effort for a broader family of splitting algorithms.
The explicit convexity constraint inside the network architecture may make it easier to certify safety properties when the learned optimizer is deployed inside a control loop.

Load-bearing premise

The learned ICNN must approximate the true Moreau envelope accurately enough that the substituted model still satisfies the convexity and smoothness conditions required by the convergence proofs.

What would settle it

A convex test problem on which the learned model produces iterates whose optimality gap grows or whose feasibility violation exceeds a fixed tolerance, while the same problem solved with the exact Moreau envelope converges normally.

Figures

Figures reproduced from arXiv: 2606.08993 by Binh Nguyen, Trinh Tran, Truong X. Nghiem.

**Figure 2.** Figure 2: Example 1 of energy management in a microgrid. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Optimality gap over solving time for Example 1. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Example 1: Prediction over 24-hour (96-step) horizon [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: provides a statistical breakdown of the solving time with m = 55 under optimality gaps of 1% and 0.1%. While Table IV summarizes average performance, these box plots reveal the consistency of sMEL-ADMM. The interquartile range of sMEL-ADMM is significantly narrower and lower than those of Mosek and Clarabel. Even in the more stringent optimality gap scenario, sMEL-ADMM maintains a much tighter distribution… view at source ↗

read the original abstract

We propose LEAF, a learning-enabled ADMM framework for accelerated convex optimization. The key idea is to approximate the Moreau envelope of the objective function using an Input Convex Neural Network (ICNN), resulting in a learned model that preserves convexity and smoothness. This leads to the proposed Moreau Envelope Learning ADMM (MEL-ADMM) and its splitting variant sMEL-ADMM. Unlike existing approaches that learn high-dimensional operators directly, LEAF learns a scalar-valued Moreau envelope, significantly reducing model complexity and improving data efficiency. The framework accommodates a broad class of convex problems with smooth and non-smooth objectives. By embedding convexity explicitly through the ICNN architecture, the proposed approach maintains high approximation accuracy while preserving key structural properties of the optimization problem. Both MEL-ADMM and sMEL-ADMM are developed with theoretical guarantees of convergence and feasibility under the learned model. Rigorous analysis shows that the proposed methods achieve convergence rates comparable to classical ADMM while reducing per-iteration computational cost. Numerical experiments demonstrate up to an order-of-magnitude speedup over state-of-the-art solvers while maintaining low optimality gaps

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LEAF learns a scalar Moreau envelope via ICNN inside ADMM to cut per-iteration cost while claiming classical rates, but the error-tolerance argument looks like the weakest link.

read the letter

The paper's main move is to approximate the Moreau envelope with an input-convex network instead of learning a full proximal operator or high-dimensional map. That choice keeps the model smaller and data-efficient while the architecture itself enforces convexity and smoothness. The splitting variant sMEL-ADMM extends the same idea to problems where the objective splits naturally.

The architecture-level guarantee is a genuine plus; it avoids post-hoc projection steps that other learned solvers sometimes need. The abstract also states that both variants come with convergence and feasibility results under the learned model, and the experiments report order-of-magnitude wall-clock gains on the tested instances.

The soft spot is exactly where the stress-test note flags it. The claim of comparable O(1/k) rates rests on the ICNN error being small enough not to degrade the contraction or feasibility properties. The abstract says the learned model preserves the needed properties and that guarantees hold “under the learned model,” but it does not indicate an explicit tolerance on the approximation error or a perturbation bound that shows how much error still yields the same rate. Without that quantitative link, the theoretical part reduces to the classical ADMM proof plus an unverified “error is small” assumption.

The numerical results are presented as maintaining low optimality gaps, yet the abstract gives no detail on problem selection, data-exclusion rules, or whether the baselines were tuned fairly. That makes it hard to judge how general the speedups are.

The work is aimed at people who already use ADMM in control, signal processing, or large-scale convex ML problems and want a learned acceleration that still has some theory attached. A reader in that niche could extract the scalar-learning idea even if the full guarantees need tightening.

It is worth sending to referees so the derivations and experimental protocol can be checked; the central technical choice is clear enough that a serious review would be productive rather than a waste of time.

Referee Report

2 major / 2 minor

Summary. The paper proposes LEAF, a learning-enabled ADMM framework that approximates the Moreau envelope of convex objectives via Input Convex Neural Networks (ICNNs) to obtain MEL-ADMM and its splitting variant sMEL-ADMM. The central claims are that the ICNN approximation preserves convexity and smoothness, that both variants admit convergence guarantees comparable to classical ADMM (including O(1/k) rates), and that numerical experiments show up to an order-of-magnitude speedup over state-of-the-art solvers while keeping low optimality gaps. The approach is positioned as more data-efficient than direct operator learning because it targets a scalar-valued envelope.

Significance. If the error-propagation analysis can be completed, the work would offer a principled route to data-driven acceleration of proximal algorithms for a broad class of convex problems, trading per-iteration cost for a learned model whose structural properties are architecturally enforced. The reduction from learning high-dimensional operators to learning a scalar Moreau envelope is a concrete complexity improvement that could be reusable beyond ADMM.

major comments (2)

[Theoretical Analysis] Theoretical Analysis section: the convergence theorem for MEL-ADMM under the learned model asserts rates comparable to classical ADMM, yet provides no explicit tolerance on the ICNN approximation error ||f_learned - f_true|| (or on the induced proximal-operator error) that is sufficient to preserve the contraction or feasibility properties used in the proof. The argument therefore reduces to the classical case plus an unquantified perturbation claim.
[Theoretical Analysis] § on sMEL-ADMM convergence: the splitting variant inherits the same unquantified error assumption; because the learned proximal step appears inside the ADMM iteration, an explicit propagation bound through the augmented-Lagrangian update is required to justify that the O(1/k) rate (or a controlled degradation) still holds.

minor comments (2)

[Problem Formulation] The abstract states that the framework 'accommodates a broad class of convex problems with smooth and non-smooth objectives,' but the precise regularity assumptions (e.g., strong convexity modulus, Lipschitz constants) under which the ICNN training and convergence guarantees apply should be stated explicitly in the problem formulation section.
[Experiments] Numerical experiments: the reported speedups are given relative to 'state-of-the-art solvers,' but the precise solver versions, stopping tolerances, and hardware configuration are not listed in the experimental protocol; adding a table of these parameters would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the theoretical analysis. The observations correctly identify that the convergence statements would be strengthened by explicit error tolerances. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Theoretical Analysis] Theoretical Analysis section: the convergence theorem for MEL-ADMM under the learned model asserts rates comparable to classical ADMM, yet provides no explicit tolerance on the ICNN approximation error ||f_learned - f_true|| (or on the induced proximal-operator error) that is sufficient to preserve the contraction or feasibility properties used in the proof. The argument therefore reduces to the classical case plus an unquantified perturbation claim.

Authors: We agree that an explicit tolerance on the approximation error is needed to make the perturbation argument rigorous. The current proof invokes the preserved convexity and smoothness of the ICNN but does not quantify how large the error ||f_learned - f_true|| may be while still guaranteeing the O(1/k) rate. In the revised manuscript we will insert a new lemma that derives the admissible error bound in terms of the ADMM penalty parameter and the strong-convexity modulus (when present), together with the resulting degradation in the convergence constant. revision: yes
Referee: [Theoretical Analysis] § on sMEL-ADMM convergence: the splitting variant inherits the same unquantified error assumption; because the learned proximal step appears inside the ADMM iteration, an explicit propagation bound through the augmented-Lagrangian update is required to justify that the O(1/k) rate (or a controlled degradation) still holds.

Authors: We concur that the error introduced by the learned proximal operator must be tracked through the augmented-Lagrangian updates for sMEL-ADMM. The revised version will contain an additional proposition that propagates the proximal error through one full ADMM iteration and shows that the same O(1/k) rate is retained provided the per-iteration error satisfies a summable bound derived from the dual step-size and the Lipschitz constant of the linear operator. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external data-driven approximation and standard ADMM analysis

full rationale

The paper's core steps are (1) training an ICNN on data to approximate the Moreau envelope while enforcing convexity/smoothness via architecture, (2) substituting the learned model into ADMM iterations to obtain MEL-ADMM/sMEL-ADMM, and (3) providing convergence analysis under the learned model. None of these reduce by construction to their own inputs: the ICNN parameters are fitted from external samples rather than defined in terms of the target rates, the convergence claims are stated as holding for the approximate operator (with the approximation error treated as an external perturbation), and no self-citation chain or uniqueness theorem is invoked to force the result. The framework therefore remains self-contained against external benchmarks and data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that ICNNs can produce sufficiently accurate convex approximations of the Moreau envelope. The number and nature of free parameters inside the ICNN are not stated in the abstract.

free parameters (1)

ICNN weights
Parameters of the input convex neural network are fitted to data to approximate the Moreau envelope; exact count and regularization are unknown from the abstract.

axioms (1)

domain assumption Input convex neural networks preserve convexity when used to model the Moreau envelope
The framework requires this architectural property to transfer convergence guarantees from classical ADMM to the learned variants.

pith-pipeline@v0.9.1-grok · 5734 in / 1228 out tokens · 28348 ms · 2026-06-27T17:35:11.631568+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 3 canonical work pages

[1]

Learning to optimize: A primer and a benchmark,

T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,”Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022. 11

2022
[2]

Learning to optimize in model predictive control,

J. Sacks and B. Boots, “Learning to optimize in model predictive control,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 549–10 556

2022
[3]

Admm-based algorithm for training fault tolerant rbf networks and selecting centers,

H. Wang, R. Feng, Z.-F. Han, and C.-S. Leung, “Admm-based algorithm for training fault tolerant rbf networks and selecting centers,”IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3870–3878, 2017

2017
[4]

Collision-free minimum-time trajectory planning for mul- tiple vehicles based on admm,

T. B. Nguyen, T. Nguyen, T. Nghiem, L. Nguyen, J. Baca, P. Rangel, and H.-K. Song, “Collision-free minimum-time trajectory planning for mul- tiple vehicles based on admm,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 13 785–13 790

2022
[5]

Communication- efficient ADMM using Gaussian process regression and fully adaptive uniform quantization,

A. Duarte, B. Nguyen, T. X. Nghiem, and S. Wei, “Communication- efficient ADMM using Gaussian process regression and fully adaptive uniform quantization,”Franklin Open, vol. 15, p. 100587, 2026

2026
[6]

Des-inspired accelerated unfolded linearized admm networks for inverse problems,

W. An, Y . Liu, F. Shang, H. Liu, and L. Jiao, “Des-inspired accelerated unfolded linearized admm networks for inverse problems,”IEEE Trans- actions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 5319–5333, 2025

2025
[7]

ADMM Enhancement Techniques for Distributed Optimal Power Flow,

M. Hasanzadeh and A. Kargarian, “ADMM Enhancement Techniques for Distributed Optimal Power Flow,”IEEE Transactions on Power Systems, vol. 41, no. 2, pp. 1321–1333, Mar. 2026

2026
[8]

Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,

P. Xu, Y . Wang, X. Chen, and Z. Tian, “Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17 987–17 999, 2024

2024
[9]

StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs,

T. Zhang, S. Ye, X. Feng, X. Ma, K. Zhang, Z. Li, J. Tang, S. Liu, X. Lin, Y . Liu, M. Fardad, and Y . Wang, “StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 5, pp. 2259– 2273, 2022

2022
[10]

Connectivity-preserving distributed informative path planning for mo- bile robot networks,

B. Nguyen, T. X. Nghiem, L. Nguyen, H. M. La, and T. Nguyen, “Connectivity-preserving distributed informative path planning for mo- bile robot networks,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2949–2956, 2024

2024
[11]

Machine learning for combinato- rial optimization: a methodological tour d’horizon,

Y . Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinato- rial optimization: a methodological tour d’horizon,”European Journal of Operational Research, vol. 290, no. 2, pp. 405–421, 2021

2021
[12]

Koziel and L

S. Koziel and L. Leifsson,Surrogate-based modeling and optimization, 1st ed. Springer New York, NY , 2013

2013
[13]

Learning optimal solutions for extremely fast ac optimal power flow,

A. S. Zamzam and K. Baker, “Learning optimal solutions for extremely fast ac optimal power flow,” in2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, Nov. 2020, pp. 1–6

2020
[14]

Constrained neural networks for approximate nonlinear model predictive control,

S. Adhau, V . V . Naik, and S. Skogestad, “Constrained neural networks for approximate nonlinear model predictive control,” in2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 295–300

2021
[15]

KKT-based optimality con- ditions for neural network approximation,

V . Peiris, N. Sukhorukova, and J. Ugon, “KKT-based optimality con- ditions for neural network approximation,”arXiv preprint 2506.17305, 2025

work page arXiv 2025
[16]

DC3: A learning method for optimization with hard constraints,

P. L. Donti, D. Rolnick, and J. Z. Kolter, “DC3: A learning method for optimization with hard constraints,” inInternational Conference on Learning Representations, 2021

2021
[17]

FSNet: Feasibility-seeking neural network for constrained optimization with guarantees,

H. T. Nguyen and P. L. Donti, “FSNet: Feasibility-seeking neural network for constrained optimization with guarantees,” inAnnual Con- ference on Neural Information Processing Systems, 2026

2026
[18]

Learning warm-start points for AC Optimal Power Flow,

K. Baker, “Learning warm-start points for AC Optimal Power Flow,” in2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Oct. 2019, pp. 1–6

2019
[19]

Learning to warm- start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm- start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

2024
[20]

Learning for constrained optimization: Identifying optimal active constraint sets,

S. Misra, L. Roald, and Y . Ng, “Learning for constrained optimization: Identifying optimal active constraint sets,”INFORMS Journal on Com- puting, vol. 34, no. 1, pp. 463–480, 2022

2022
[21]

One network to solve them all — solving linear inverse problems using deep projection models,

J. R. Chang, C.-L. Li, B. P ´oczos, B. Vijaya Kumar, and A. C. Sankara- narayanan, “One network to solve them all — solving linear inverse problems using deep projection models,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5889–5898

2017
[22]

Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,

T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” inProceedings of the IEEE International Confer- ence on Computer Vision, 2017, pp. 1781–1790

2017
[23]

Deep ADMM-Net for compressive sensing MRI,

J. Sun, H. Li, Z. Xuet al., “Deep ADMM-Net for compressive sensing MRI,”Advances in neural information processing systems, vol. 29, 2016

2016
[24]

ADMM-CSNet: A deep learning approach for image compressive sensing,

Y . Yang, J. Sun, H. Li, and Z. Xu, “ADMM-CSNet: A deep learning approach for image compressive sensing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 521–538, 2020

2020
[25]

Learning to learn by gradient descent by gradient descent,

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,”Advances in neural information processing systems, vol. 29, 2016

2016
[26]

Learning to learn without gradient descent by gradient descent,

Y . Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. Freitas, “Learning to learn without gradient descent by gradient descent,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 748–756

2017
[27]

Self-supervised primal-dual learning for constrained optimization,

S. Park and P. Van Hentenryck, “Self-supervised primal-dual learning for constrained optimization,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 4052–4060

2023
[28]

Hard-constrained neural networks with universal approxi- mation guarantees.arXiv preprint arXiv:2410.10807, 2024

Y . Min and N. Azizan, “Hardnet: Hard-constrained neural networks with universal approximation guarantees,”arXiv preprint 2410.10807, 2024

work page arXiv 2024
[29]

R. T. Rockafellar,Convex analysis. Princeton University Press, 1997

1997
[30]

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,”Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, Jul. 2011

2011
[31]

A general analysis of the convergence of ADMM,

R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” inProceedings of the 32nd International Conference on Machine Learning, vol. 37. PMLR, 2015, pp. 343–352

2015
[32]

Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control,

J. Eckstein and M. C. Ferris, “Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control,”INFORMS Journal on Computing, vol. 10, no. 2, pp. 218–235, 1998

1998
[33]

H. H. Bauschke and P. L. Combettes,Convex analysis and monotone operator theory in Hilbert spaces, 2nd ed. Springer Cham, 2020

2020
[34]

D. P. Bertsekas,Convex Optimization Algorithms. Athena Scientific, 2015

2015
[35]

Learning proximal operators with gaussian processes,

T. X. Nghiem, G. Stathopoulos, and C. N. Jones, “Learning proximal operators with gaussian processes,” in2018 56th Annual Allerton Con- ference on Communication, Control, and Computing (Allerton). IEEE, 2018, pp. 935–942

2018
[36]

Input convex neural networks,

B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in International conference on machine learning, 2017, pp. 146–155

2017
[37]

Strongly Convex Functions, Moreau En- velopes, and the Generic Nature of Convex Functions with Strong Minimizers,

C. Planiden and X. Wang, “Strongly Convex Functions, Moreau En- velopes, and the Generic Nature of Convex Functions with Strong Minimizers,”SIAM Journal on Optimization, vol. 26, no. 2, pp. 1341– 1364, Jan. 2016

2016
[38]

Osqp: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “Osqp: An operator splitting solver for quadratic programs,” in2018 UKACC 12th International Conference on Control (CONTROL), 2018, pp. 339–339

2018
[39]

Lipschitz regularity of deep neural net- works: analysis and efficient estimation,

A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural net- works: analysis and efficient estimation,”Advances in neural information processing systems, vol. 31, 2018

2018
[40]

Efficient and accurate estimation of lipschitz constants for deep neural networks,

M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of lipschitz constants for deep neural networks,” Advances in neural information processing systems, vol. 32, 2019

2019
[41]

Model predictive control: Recent developments and future promise,

D. Q. Mayne, “Model predictive control: Recent developments and future promise,”Automatica, vol. 50, no. 12, pp. 2967–2986, 2014

2014
[42]

On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,

A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

2006
[43]

GPU-accelerated dynamic nonlinear optimiza- tion with ExaModels and MadNLP,

F. Pacaud and S. Shin, “GPU-accelerated dynamic nonlinear optimiza- tion with ExaModels and MadNLP,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 5963–5968

2024
[44]

The MOSEK interior point opti- mizer for linear programming: an implementation of the homogeneous algorithm,

E. D. Andersen and K. D. Andersen, “The MOSEK interior point opti- mizer for linear programming: an implementation of the homogeneous algorithm,” inHigh performance optimization. Springer, 2000, pp. 197–232

2000
[45]

Clarabel: An interior-point solver for conic programs with quadratic objectives,

P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint 2405.12762, 2024

work page arXiv 2024
[46]

Economic model predictive control for time-varying cost and peak demand charge optimization,

M. J. Risbeck and J. B. Rawlings, “Economic model predictive control for time-varying cost and peak demand charge optimization,”IEEE Transactions on Automatic Control, vol. 65, no. 7, pp. 2957–2968, 2020

2020
[47]

Economic MPC with an Online Reference Trajectory for Battery Scheduling Considering Demand Charge Management,

C. Cortes-Aguirre, Y .-A. Chen, A. Ghosh, J. Kleissl, and A. Khurram, “Economic MPC with an Online Reference Trajectory for Battery Scheduling Considering Demand Charge Management,”IEEE Trans- actions on Smart Grid, pp. 1–1, 2025

2025
[48]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004

2004
[49]

Computing minimum-volume enclosing ellipsoids,

N. Bowman and M. T. Heath, “Computing minimum-volume enclosing ellipsoids,”Mathematical Programming Computation, vol. 15, no. 4, pp. 621–650, 2023

2023
[50]

R. T. Rockafellar and R. J. Wets,Variational analysis, 1st ed. Springer Berlin, Heidelberg, 1997. 12

1997
[51]

Borwein and A

J. Borwein and A. Lewis,Convex Analysis and Nonlinear Optimization: Theoryand Examples, 2nd ed. Springer New York, NY , 2005. APPENDIXA PROOF OFLEMMA2 We will show that∇ 2 xh(x)is uniformly bounded. We first derive the expressions for∇ xh(x)and∇ 2 xh(x). LetD j = diag ϕ′ j(yj) andE j = diag ϕ′′ j (yj) . The JacobianJ j = ∂vj ∂x is given by the chain rule ...

2005
[52]

Based on Lemma 6, the ME is a special case of infimal convolution with the functionq λ(x) = 1 2λ ∥x∥2

Therefore,f∈Γ 0(Rn). Based on Lemma 6, the ME is a special case of infimal convolution with the functionq λ(x) = 1 2λ ∥x∥2
[53]

Hence,M λf=g

Sinceq ⋆ λ(y) = λ 2 ∥y∥2 2, we have (Mλf) ⋆ =f ⋆ +q ⋆ λ =f ⋆ + λ 2 ∥.∥2 2 =g ⋆. Hence,M λf=g. Uniqueness:Assume there exists another ˆf∈Γ 0(Rn)such thatM λ ˆf=g. Using the same conjugate identity, g⋆ = (Mλ ˆf) ⋆ = ˆf ⋆ + λ 2 ∥.∥2 2. Thus ˆf ⋆ =g ⋆−λ 2 ∥.∥2 2 =f ⋆. By Lemma 4, ˆf= ˆf ⋆⋆ =f ⋆⋆ =f

[1] [1]

Learning to optimize: A primer and a benchmark,

T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,”Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022. 11

2022

[2] [2]

Learning to optimize in model predictive control,

J. Sacks and B. Boots, “Learning to optimize in model predictive control,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 549–10 556

2022

[3] [3]

Admm-based algorithm for training fault tolerant rbf networks and selecting centers,

H. Wang, R. Feng, Z.-F. Han, and C.-S. Leung, “Admm-based algorithm for training fault tolerant rbf networks and selecting centers,”IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3870–3878, 2017

2017

[4] [4]

Collision-free minimum-time trajectory planning for mul- tiple vehicles based on admm,

T. B. Nguyen, T. Nguyen, T. Nghiem, L. Nguyen, J. Baca, P. Rangel, and H.-K. Song, “Collision-free minimum-time trajectory planning for mul- tiple vehicles based on admm,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 13 785–13 790

2022

[5] [5]

Communication- efficient ADMM using Gaussian process regression and fully adaptive uniform quantization,

A. Duarte, B. Nguyen, T. X. Nghiem, and S. Wei, “Communication- efficient ADMM using Gaussian process regression and fully adaptive uniform quantization,”Franklin Open, vol. 15, p. 100587, 2026

2026

[6] [6]

Des-inspired accelerated unfolded linearized admm networks for inverse problems,

W. An, Y . Liu, F. Shang, H. Liu, and L. Jiao, “Des-inspired accelerated unfolded linearized admm networks for inverse problems,”IEEE Trans- actions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 5319–5333, 2025

2025

[7] [7]

ADMM Enhancement Techniques for Distributed Optimal Power Flow,

M. Hasanzadeh and A. Kargarian, “ADMM Enhancement Techniques for Distributed Optimal Power Flow,”IEEE Transactions on Power Systems, vol. 41, no. 2, pp. 1321–1333, Mar. 2026

2026

[8] [8]

Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,

P. Xu, Y . Wang, X. Chen, and Z. Tian, “Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17 987–17 999, 2024

2024

[9] [9]

StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs,

T. Zhang, S. Ye, X. Feng, X. Ma, K. Zhang, Z. Li, J. Tang, S. Liu, X. Lin, Y . Liu, M. Fardad, and Y . Wang, “StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 5, pp. 2259– 2273, 2022

2022

[10] [10]

Connectivity-preserving distributed informative path planning for mo- bile robot networks,

B. Nguyen, T. X. Nghiem, L. Nguyen, H. M. La, and T. Nguyen, “Connectivity-preserving distributed informative path planning for mo- bile robot networks,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2949–2956, 2024

2024

[11] [11]

Machine learning for combinato- rial optimization: a methodological tour d’horizon,

Y . Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinato- rial optimization: a methodological tour d’horizon,”European Journal of Operational Research, vol. 290, no. 2, pp. 405–421, 2021

2021

[12] [12]

Koziel and L

S. Koziel and L. Leifsson,Surrogate-based modeling and optimization, 1st ed. Springer New York, NY , 2013

2013

[13] [13]

Learning optimal solutions for extremely fast ac optimal power flow,

A. S. Zamzam and K. Baker, “Learning optimal solutions for extremely fast ac optimal power flow,” in2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, Nov. 2020, pp. 1–6

2020

[14] [14]

Constrained neural networks for approximate nonlinear model predictive control,

S. Adhau, V . V . Naik, and S. Skogestad, “Constrained neural networks for approximate nonlinear model predictive control,” in2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 295–300

2021

[15] [15]

KKT-based optimality con- ditions for neural network approximation,

V . Peiris, N. Sukhorukova, and J. Ugon, “KKT-based optimality con- ditions for neural network approximation,”arXiv preprint 2506.17305, 2025

work page arXiv 2025

[16] [16]

DC3: A learning method for optimization with hard constraints,

P. L. Donti, D. Rolnick, and J. Z. Kolter, “DC3: A learning method for optimization with hard constraints,” inInternational Conference on Learning Representations, 2021

2021

[17] [17]

FSNet: Feasibility-seeking neural network for constrained optimization with guarantees,

H. T. Nguyen and P. L. Donti, “FSNet: Feasibility-seeking neural network for constrained optimization with guarantees,” inAnnual Con- ference on Neural Information Processing Systems, 2026

2026

[18] [18]

Learning warm-start points for AC Optimal Power Flow,

K. Baker, “Learning warm-start points for AC Optimal Power Flow,” in2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Oct. 2019, pp. 1–6

2019

[19] [19]

Learning to warm- start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm- start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

2024

[20] [20]

Learning for constrained optimization: Identifying optimal active constraint sets,

S. Misra, L. Roald, and Y . Ng, “Learning for constrained optimization: Identifying optimal active constraint sets,”INFORMS Journal on Com- puting, vol. 34, no. 1, pp. 463–480, 2022

2022

[21] [21]

One network to solve them all — solving linear inverse problems using deep projection models,

J. R. Chang, C.-L. Li, B. P ´oczos, B. Vijaya Kumar, and A. C. Sankara- narayanan, “One network to solve them all — solving linear inverse problems using deep projection models,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5889–5898

2017

[22] [22]

Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,

T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” inProceedings of the IEEE International Confer- ence on Computer Vision, 2017, pp. 1781–1790

2017

[23] [23]

Deep ADMM-Net for compressive sensing MRI,

J. Sun, H. Li, Z. Xuet al., “Deep ADMM-Net for compressive sensing MRI,”Advances in neural information processing systems, vol. 29, 2016

2016

[24] [24]

ADMM-CSNet: A deep learning approach for image compressive sensing,

Y . Yang, J. Sun, H. Li, and Z. Xu, “ADMM-CSNet: A deep learning approach for image compressive sensing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 521–538, 2020

2020

[25] [25]

Learning to learn by gradient descent by gradient descent,

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,”Advances in neural information processing systems, vol. 29, 2016

2016

[26] [26]

Learning to learn without gradient descent by gradient descent,

Y . Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. Freitas, “Learning to learn without gradient descent by gradient descent,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 748–756

2017

[27] [27]

Self-supervised primal-dual learning for constrained optimization,

S. Park and P. Van Hentenryck, “Self-supervised primal-dual learning for constrained optimization,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 4052–4060

2023

[28] [28]

Hard-constrained neural networks with universal approxi- mation guarantees.arXiv preprint arXiv:2410.10807, 2024

Y . Min and N. Azizan, “Hardnet: Hard-constrained neural networks with universal approximation guarantees,”arXiv preprint 2410.10807, 2024

work page arXiv 2024

[29] [29]

R. T. Rockafellar,Convex analysis. Princeton University Press, 1997

1997

[30] [30]

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,”Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, Jul. 2011

2011

[31] [31]

A general analysis of the convergence of ADMM,

R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” inProceedings of the 32nd International Conference on Machine Learning, vol. 37. PMLR, 2015, pp. 343–352

2015

[32] [32]

Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control,

J. Eckstein and M. C. Ferris, “Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control,”INFORMS Journal on Computing, vol. 10, no. 2, pp. 218–235, 1998

1998

[33] [33]

H. H. Bauschke and P. L. Combettes,Convex analysis and monotone operator theory in Hilbert spaces, 2nd ed. Springer Cham, 2020

2020

[34] [34]

D. P. Bertsekas,Convex Optimization Algorithms. Athena Scientific, 2015

2015

[35] [35]

Learning proximal operators with gaussian processes,

T. X. Nghiem, G. Stathopoulos, and C. N. Jones, “Learning proximal operators with gaussian processes,” in2018 56th Annual Allerton Con- ference on Communication, Control, and Computing (Allerton). IEEE, 2018, pp. 935–942

2018

[36] [36]

Input convex neural networks,

B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in International conference on machine learning, 2017, pp. 146–155

2017

[37] [37]

Strongly Convex Functions, Moreau En- velopes, and the Generic Nature of Convex Functions with Strong Minimizers,

C. Planiden and X. Wang, “Strongly Convex Functions, Moreau En- velopes, and the Generic Nature of Convex Functions with Strong Minimizers,”SIAM Journal on Optimization, vol. 26, no. 2, pp. 1341– 1364, Jan. 2016

2016

[38] [38]

Osqp: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “Osqp: An operator splitting solver for quadratic programs,” in2018 UKACC 12th International Conference on Control (CONTROL), 2018, pp. 339–339

2018

[39] [39]

Lipschitz regularity of deep neural net- works: analysis and efficient estimation,

A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural net- works: analysis and efficient estimation,”Advances in neural information processing systems, vol. 31, 2018

2018

[40] [40]

Efficient and accurate estimation of lipschitz constants for deep neural networks,

M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of lipschitz constants for deep neural networks,” Advances in neural information processing systems, vol. 32, 2019

2019

[41] [41]

Model predictive control: Recent developments and future promise,

D. Q. Mayne, “Model predictive control: Recent developments and future promise,”Automatica, vol. 50, no. 12, pp. 2967–2986, 2014

2014

[42] [42]

On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,

A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

2006

[43] [43]

GPU-accelerated dynamic nonlinear optimiza- tion with ExaModels and MadNLP,

F. Pacaud and S. Shin, “GPU-accelerated dynamic nonlinear optimiza- tion with ExaModels and MadNLP,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 5963–5968

2024

[44] [44]

The MOSEK interior point opti- mizer for linear programming: an implementation of the homogeneous algorithm,

E. D. Andersen and K. D. Andersen, “The MOSEK interior point opti- mizer for linear programming: an implementation of the homogeneous algorithm,” inHigh performance optimization. Springer, 2000, pp. 197–232

2000

[45] [45]

Clarabel: An interior-point solver for conic programs with quadratic objectives,

P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint 2405.12762, 2024

work page arXiv 2024

[46] [46]

Economic model predictive control for time-varying cost and peak demand charge optimization,

M. J. Risbeck and J. B. Rawlings, “Economic model predictive control for time-varying cost and peak demand charge optimization,”IEEE Transactions on Automatic Control, vol. 65, no. 7, pp. 2957–2968, 2020

2020

[47] [47]

Economic MPC with an Online Reference Trajectory for Battery Scheduling Considering Demand Charge Management,

C. Cortes-Aguirre, Y .-A. Chen, A. Ghosh, J. Kleissl, and A. Khurram, “Economic MPC with an Online Reference Trajectory for Battery Scheduling Considering Demand Charge Management,”IEEE Trans- actions on Smart Grid, pp. 1–1, 2025

2025

[48] [48]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004

2004

[49] [49]

Computing minimum-volume enclosing ellipsoids,

N. Bowman and M. T. Heath, “Computing minimum-volume enclosing ellipsoids,”Mathematical Programming Computation, vol. 15, no. 4, pp. 621–650, 2023

2023

[50] [50]

R. T. Rockafellar and R. J. Wets,Variational analysis, 1st ed. Springer Berlin, Heidelberg, 1997. 12

1997

[51] [51]

Borwein and A

J. Borwein and A. Lewis,Convex Analysis and Nonlinear Optimization: Theoryand Examples, 2nd ed. Springer New York, NY , 2005. APPENDIXA PROOF OFLEMMA2 We will show that∇ 2 xh(x)is uniformly bounded. We first derive the expressions for∇ xh(x)and∇ 2 xh(x). LetD j = diag ϕ′ j(yj) andE j = diag ϕ′′ j (yj) . The JacobianJ j = ∂vj ∂x is given by the chain rule ...

2005

[52] [52]

Based on Lemma 6, the ME is a special case of infimal convolution with the functionq λ(x) = 1 2λ ∥x∥2

Therefore,f∈Γ 0(Rn). Based on Lemma 6, the ME is a special case of infimal convolution with the functionq λ(x) = 1 2λ ∥x∥2

[53] [53]

Hence,M λf=g

Sinceq ⋆ λ(y) = λ 2 ∥y∥2 2, we have (Mλf) ⋆ =f ⋆ +q ⋆ λ =f ⋆ + λ 2 ∥.∥2 2 =g ⋆. Hence,M λf=g. Uniqueness:Assume there exists another ˆf∈Γ 0(Rn)such thatM λ ˆf=g. Using the same conjugate identity, g⋆ = (Mλ ˆf) ⋆ = ˆf ⋆ + λ 2 ∥.∥2 2. Thus ˆf ⋆ =g ⋆−λ 2 ∥.∥2 2 =f ⋆. By Lemma 4, ˆf= ˆf ⋆⋆ =f ⋆⋆ =f