Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

Junan Lin; Luca Furieri; Paul J. Goulart

arxiv: 2604.26932 · v1 · submitted 2026-04-29 · 🧮 math.OC · cs.LG

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

Junan Lin , Paul J. Goulart , Luca Furieri This is my paper

Pith reviewed 2026-05-07 09:27 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords ADMMover-relaxationlearning policiesconvergence guaranteesquadratic programmingmodel predictive controlOSQPtime-varying parameters

0 comments

The pith

Learning online updates to the ADMM relaxation parameter yields convergence guarantees while cutting iterations on repeated quadratic programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve ADMM performance on families of related convex problems by learning policies that adapt the relaxation parameter at each iteration. This targets applications like model predictive control where the problem structure stays fixed but numerical values change, making cheap parameter tweaks valuable. The authors prove that ADMM still converges when both penalty and relaxation parameters vary over time, provided the sequences meet basic boundedness and summability conditions. Experiments on standard quadratic-program benchmarks then show that the learned policies reduce both iteration counts and total wall-clock time relative to the default OSQP solver. A sympathetic reader cares because the approach keeps theoretical safety while letting data-driven tuning accelerate practical solves without expensive refactorizations.

Core claim

We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions on the parameter sequences, and show on benchmark quadratic programs that the resulting learned policies for online relaxation updates improve both iteration count and wall-clock time over baseline OSQP.

What carries the argument

The learned policy that produces online updates to the ADMM relaxation parameter at each iteration; this choice avoids the matrix refactorizations triggered by penalty changes and still preserves convergence when the generated sequences satisfy boundedness and summability.

If this is right

ADMM iterates converge to an optimal solution for any convex problem when the time-varying parameters meet the stated boundedness and summability requirements.
Relaxation-parameter adaptation can be performed online at negligible extra cost inside OSQP-style solvers because no factorization is needed.
The same learned policy applies across an entire class of related problems, such as MPC instances with fixed dynamics but varying references.
Empirical gains appear in both iteration count and measured runtime on standard QP test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same policy-learning idea could be tested on other first-order methods whose convergence proofs tolerate bounded parameter sequences.
If the learned updates generalize beyond the training distribution, manual tuning of ADMM parameters could be replaced by a single offline training step per problem class.
A natural next measurement would be to apply the policies inside a real-time MPC loop and record closed-loop latency reduction.

Load-bearing premise

The sequences of time-varying penalty and relaxation parameters generated by the learned policy must remain bounded and satisfy summability conditions.

What would settle it

A convex quadratic program on which the learned policy produces a parameter sequence that violates the summability condition and causes ADMM to diverge or stall.

read the original abstract

The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty updates. We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions, and show on benchmark quadratic programs that the resulting learned policies improve both iteration count and wall-clock time over baseline OSQP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a convergence theorem for ADMM with time-varying relaxation parameters and pairs it with a learned online policy, but the theorem's assumptions are not shown to hold for the learned sequences.

read the letter

The main thing here is a practical tweak to ADMM: learn the relaxation parameter on the fly for repeated solves of similar problems, without touching the penalty that would force refactorizations. This fits settings like MPC where structure is fixed but data changes. They prove convergence for time-varying penalty and relaxation under boundedness and summability conditions on the sequences, which is a clean extension of standard ADMM analysis. The experiments on quadratic program benchmarks show lower iteration counts and better wall-clock time than plain OSQP, which is the kind of result that matters for implementers.

Referee Report

2 major / 1 minor

Summary. The paper proposes learning online updates to the relaxation parameter in ADMM for repeated solves of structured convex problems (e.g., MPC), establishes convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions on those sequences, and reports empirical improvements in iteration count and wall-clock time over baseline OSQP on benchmark quadratic programs.

Significance. If the convergence guarantees transfer to the learned policies and the empirical gains prove robust, the work would usefully combine theoretical analysis of time-varying ADMM with data-driven adaptation, offering a computationally cheap way to accelerate solvers on problem classes without refactorization costs.

major comments (2)

[Convergence theorem and learning procedure] The convergence theorem (stated in the abstract and presumably proved in the main theoretical section) requires mild assumptions including boundedness of the relaxation sequence and a summability condition such as ∑|α_{k+1}−α_k|<∞ (or an analogous condition on the penalty). The neural policy architecture and training procedure described for online learning do not include any explicit enforcement mechanism (e.g., projection, regularization, or constrained parameterization) to guarantee these properties hold for the output sequence, so the formal result does not automatically apply to the reported learned controllers.
[Experimental results on quadratic programs] The empirical claims rest on benchmark comparisons showing gains over OSQP, but the experimental protocol does not detail how (or whether) the trained policies were constrained or post-processed to satisfy the summability/boundedness hypotheses of the theorem; without this, it is unclear whether the reported speedups occur under conditions where the convergence guarantee is valid.

minor comments (1)

[Abstract and introduction] The abstract refers to 'mild assumptions' without naming them; a one-sentence enumeration of the key conditions (boundedness, summability) in the introduction would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments. We have carefully considered the points raised regarding the applicability of the convergence guarantees to the learned policies and the experimental protocol. Below we provide point-by-point responses and indicate the revisions we will make.

read point-by-point responses

Referee: [Convergence theorem and learning procedure] The convergence theorem (stated in the abstract and presumably proved in the main theoretical section) requires mild assumptions including boundedness of the relaxation sequence and a summability condition such as ∑|α_{k+1}−α_k|<∞ (or an analogous condition on the penalty). The neural policy architecture and training procedure described for online learning do not include any explicit enforcement mechanism (e.g., projection, regularization, or constrained parameterization) to guarantee these properties hold for the output sequence, so the formal result does not automatically apply to the reported learned controllers.

Authors: We agree with the referee that the convergence theorem relies on the specified assumptions on the parameter sequences, and that the learning procedure as presented does not explicitly enforce them. This is a valid point. In the revised manuscript, we will incorporate a simple projection step onto a bounded interval for the relaxation parameter output by the policy, and add a regularization term in the training loss that penalizes large variations between consecutive predictions to promote the summability condition in practice. We will also clarify in the theoretical section that the guarantees apply to the projected policies. This addresses the concern directly without altering the core contribution. revision: yes
Referee: [Experimental results on quadratic programs] The empirical claims rest on benchmark comparisons showing gains over OSQP, but the experimental protocol does not detail how (or whether) the trained policies were constrained or post-processed to satisfy the summability/boundedness hypotheses of the theorem; without this, it is unclear whether the reported speedups occur under conditions where the convergence guarantee is valid.

Authors: We acknowledge that the current experimental section lacks explicit details on ensuring the assumptions hold for the learned policies. In the revision, we will expand the experimental protocol to describe the projection and regularization used during training and inference, and report statistics on the observed variation in the relaxation sequences to verify that the summability condition is approximately satisfied in the finite-horizon solves. This will make clear that the empirical results are obtained under conditions compatible with the theorem. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; theorem and learning remain independent.

full rationale

The paper presents a convergence theorem for time-varying ADMM under separate mild assumptions on the parameter sequences, followed by an independent learning procedure for relaxation policies and benchmark comparisons. No equations or claims reduce the theorem's hypotheses to the learned outputs by construction, nor do any self-citations or fitted inputs make the reported improvements equivalent to the inputs. The derivation chain is self-contained, with the formal result not redefined via the empirical policies.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on (1) mild technical assumptions that permit convergence when both penalty and relaxation vary, and (2) the existence of a learnable policy that improves performance on the target problem class while respecting those assumptions. No new mathematical entities are introduced.

free parameters (1)

learned policy parameters
The policy that maps solver state to relaxation value is trained on data from the problem class of interest; its weights or structure are fitted rather than derived.

axioms (1)

domain assumption Mild assumptions on the sequences of time-varying penalty and relaxation parameters suffice for convergence of ADMM
Invoked in the abstract to support the convergence guarantee for the time-varying case.

pith-pipeline@v0.9.0 · 5428 in / 1342 out tokens · 49366 ms · 2026-05-07T09:27:49.015691+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Distributed op- timization and statistical learning via the alternating direction method of multipliers,

B. Stephen, P. Neal, C. Eric, P. Borja, and E. Jonathan, “Distributed op- timization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011
[2]

A general analysis of the convergence of ADMM,

R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” inInternational conference on machine learning. PMLR, 2015, pp. 343–352

work page 2015
[3]

OSQP: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020
[4]

On the global and linear convergence of the generalized alternating direction method of multipliers,

W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,”Journal of Scientific Computing, vol. 66, no. 3, pp. 889–916, 2016

work page 2016
[5]

Adaptive ADMM with spectral penalty parameter selection,

Z. Xu, M. Figueiredo, and T. Goldstein, “Adaptive ADMM with spectral penalty parameter selection,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 718–727

work page 2017
[6]

Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,

Z. Xu, M. A. Figueiredo, X. Yuan, C. Studer, and T. Goldstein, “Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7389–7398

work page 2017
[7]

SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,

P. Verheijen, D. Goswami, and M. Lazar, “SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,” in2025 29th International Conference on System Theory, Control and Com- puting (ICSTCC). IEEE, 2025, pp. 569–575

work page 2025
[8]

Learning to learn by gradient descent by gradient descent,

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,”Advances in neural information processing systems, vol. 29, 2016

work page 2016
[9]

Learning to optimize: A primer and a benchmark,

T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,”Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022

work page 2022
[10]

Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024

R. Sambharya and B. Stellato, “Learning algorithm hyperparam- eters for fast parametric convex optimization,”arXiv preprint arXiv:2411.15717, 2024

work page arXiv 2024
[11]

Data-driven performance guarantees for classical and learned optimizers,

——, “Data-driven performance guarantees for classical and learned optimizers,”Journal of Machine Learning Research, vol. 26, no. 171, pp. 1–49, 2025

work page 2025
[12]

Learning to warm-start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm-start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

work page 2024
[13]

Learning to optimize with convergence guarantees using nonlinear system theory,

A. Martin and L. Furieri, “Learning to optimize with convergence guarantees using nonlinear system theory,”IEEE Control Systems Letters, vol. 8, pp. 1355–1360, 2024

work page 2024
[14]

Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,

A. Martin, I. R. Manchester, and L. Furieri, “Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,”arXiv preprint arXiv:2508.00775, 2025

work page arXiv 2025
[15]

Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,

A. Martin and G. Belgioioso, “Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,”arXiv preprint arXiv:2601.07665, 2026

work page arXiv 2026
[16]

Accelerating quadratic optimization with reinforcement learning,

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Accelerating quadratic optimization with reinforcement learning,”Advances in Neural Infor- mation Processing Systems, vol. 34, pp. 21 043–21 055, 2021

work page 2021
[17]

Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,

P. Giselsson and S. Boyd, “Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,”IEEE Transactions on Automatic Control, vol. 62, no. 2, pp. 532–544, 2016

work page 2016
[18]

On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators,

J. Eckstein and D. P. Bertsekas, “On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators,”Mathematical programming, vol. 55, no. 1, pp. 293–318, 1992

work page 1992
[19]

The degenerate variable metric proximal point algorithm and adaptive stepsizes for primal– dual Douglas–Rachford,

D. A. Lorenz, J. Marquardt, and E. Naldi, “The degenerate variable metric proximal point algorithm and adaptive stepsizes for primal– dual Douglas–Rachford,”Optimization, vol. 74, no. 6, pp. 1355–1381, 2025

work page 2025
[20]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review arXiv 2017
[21]

We also acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility for carrying out this work (https://doi.org/10.5281/zenodo.22558)

A. Richards, “University of Oxford Advanced Research Computing Facility,” https://doi.org/10.5281/zenodo.22558, Aug. 2015

work page doi:10.5281/zenodo.22558 2015

[1] [1]

Distributed op- timization and statistical learning via the alternating direction method of multipliers,

B. Stephen, P. Neal, C. Eric, P. Borja, and E. Jonathan, “Distributed op- timization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011

[2] [2]

A general analysis of the convergence of ADMM,

R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” inInternational conference on machine learning. PMLR, 2015, pp. 343–352

work page 2015

[3] [3]

OSQP: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020

[4] [4]

On the global and linear convergence of the generalized alternating direction method of multipliers,

W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,”Journal of Scientific Computing, vol. 66, no. 3, pp. 889–916, 2016

work page 2016

[5] [5]

Adaptive ADMM with spectral penalty parameter selection,

Z. Xu, M. Figueiredo, and T. Goldstein, “Adaptive ADMM with spectral penalty parameter selection,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 718–727

work page 2017

[6] [6]

Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,

Z. Xu, M. A. Figueiredo, X. Yuan, C. Studer, and T. Goldstein, “Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7389–7398

work page 2017

[7] [7]

SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,

P. Verheijen, D. Goswami, and M. Lazar, “SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,” in2025 29th International Conference on System Theory, Control and Com- puting (ICSTCC). IEEE, 2025, pp. 569–575

work page 2025

[8] [8]

Learning to learn by gradient descent by gradient descent,

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,”Advances in neural information processing systems, vol. 29, 2016

work page 2016

[9] [9]

Learning to optimize: A primer and a benchmark,

T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,”Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022

work page 2022

[10] [10]

Learning algorithm hyperparameters for fast para- metric convex optimization.arXiv preprint arXiv:2411.15717, 2024

R. Sambharya and B. Stellato, “Learning algorithm hyperparam- eters for fast parametric convex optimization,”arXiv preprint arXiv:2411.15717, 2024

work page arXiv 2024

[11] [11]

Data-driven performance guarantees for classical and learned optimizers,

——, “Data-driven performance guarantees for classical and learned optimizers,”Journal of Machine Learning Research, vol. 26, no. 171, pp. 1–49, 2025

work page 2025

[12] [12]

Learning to warm-start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm-start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

work page 2024

[13] [13]

Learning to optimize with convergence guarantees using nonlinear system theory,

A. Martin and L. Furieri, “Learning to optimize with convergence guarantees using nonlinear system theory,”IEEE Control Systems Letters, vol. 8, pp. 1355–1360, 2024

work page 2024

[14] [14]

Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,

A. Martin, I. R. Manchester, and L. Furieri, “Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,”arXiv preprint arXiv:2508.00775, 2025

work page arXiv 2025

[15] [15]

Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,

A. Martin and G. Belgioioso, “Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,”arXiv preprint arXiv:2601.07665, 2026

work page arXiv 2026

[16] [16]

Accelerating quadratic optimization with reinforcement learning,

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Accelerating quadratic optimization with reinforcement learning,”Advances in Neural Infor- mation Processing Systems, vol. 34, pp. 21 043–21 055, 2021

work page 2021

[17] [17]

Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,

P. Giselsson and S. Boyd, “Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,”IEEE Transactions on Automatic Control, vol. 62, no. 2, pp. 532–544, 2016

work page 2016

[18] [18]

On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators,

J. Eckstein and D. P. Bertsekas, “On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators,”Mathematical programming, vol. 55, no. 1, pp. 293–318, 1992

work page 1992

[19] [19]

The degenerate variable metric proximal point algorithm and adaptive stepsizes for primal– dual Douglas–Rachford,

D. A. Lorenz, J. Marquardt, and E. Naldi, “The degenerate variable metric proximal point algorithm and adaptive stepsizes for primal– dual Douglas–Rachford,”Optimization, vol. 74, no. 6, pp. 1355–1381, 2025

work page 2025

[20] [20]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review arXiv 2017

[21] [21]

We also acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility for carrying out this work (https://doi.org/10.5281/zenodo.22558)

A. Richards, “University of Oxford Advanced Research Computing Facility,” https://doi.org/10.5281/zenodo.22558, Aug. 2015

work page doi:10.5281/zenodo.22558 2015