Learning Over-Relaxation Policies for ADMM with Convergence Guarantees
Pith reviewed 2026-05-07 09:27 UTC · model grok-4.3
The pith
Learning online updates to the ADMM relaxation parameter yields convergence guarantees while cutting iterations on repeated quadratic programs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions on the parameter sequences, and show on benchmark quadratic programs that the resulting learned policies for online relaxation updates improve both iteration count and wall-clock time over baseline OSQP.
What carries the argument
The learned policy that produces online updates to the ADMM relaxation parameter at each iteration; this choice avoids the matrix refactorizations triggered by penalty changes and still preserves convergence when the generated sequences satisfy boundedness and summability.
If this is right
- ADMM iterates converge to an optimal solution for any convex problem when the time-varying parameters meet the stated boundedness and summability requirements.
- Relaxation-parameter adaptation can be performed online at negligible extra cost inside OSQP-style solvers because no factorization is needed.
- The same learned policy applies across an entire class of related problems, such as MPC instances with fixed dynamics but varying references.
- Empirical gains appear in both iteration count and measured runtime on standard QP test sets.
Where Pith is reading between the lines
- The same policy-learning idea could be tested on other first-order methods whose convergence proofs tolerate bounded parameter sequences.
- If the learned updates generalize beyond the training distribution, manual tuning of ADMM parameters could be replaced by a single offline training step per problem class.
- A natural next measurement would be to apply the policies inside a real-time MPC loop and record closed-loop latency reduction.
Load-bearing premise
The sequences of time-varying penalty and relaxation parameters generated by the learned policy must remain bounded and satisfy summability conditions.
What would settle it
A convex quadratic program on which the learned policy produces a parameter sequence that violates the summability condition and causes ADMM to diverge or stall.
read the original abstract
The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty updates. We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions, and show on benchmark quadratic programs that the resulting learned policies improve both iteration count and wall-clock time over baseline OSQP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes learning online updates to the relaxation parameter in ADMM for repeated solves of structured convex problems (e.g., MPC), establishes convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions on those sequences, and reports empirical improvements in iteration count and wall-clock time over baseline OSQP on benchmark quadratic programs.
Significance. If the convergence guarantees transfer to the learned policies and the empirical gains prove robust, the work would usefully combine theoretical analysis of time-varying ADMM with data-driven adaptation, offering a computationally cheap way to accelerate solvers on problem classes without refactorization costs.
major comments (2)
- [Convergence theorem and learning procedure] The convergence theorem (stated in the abstract and presumably proved in the main theoretical section) requires mild assumptions including boundedness of the relaxation sequence and a summability condition such as ∑|α_{k+1}−α_k|<∞ (or an analogous condition on the penalty). The neural policy architecture and training procedure described for online learning do not include any explicit enforcement mechanism (e.g., projection, regularization, or constrained parameterization) to guarantee these properties hold for the output sequence, so the formal result does not automatically apply to the reported learned controllers.
- [Experimental results on quadratic programs] The empirical claims rest on benchmark comparisons showing gains over OSQP, but the experimental protocol does not detail how (or whether) the trained policies were constrained or post-processed to satisfy the summability/boundedness hypotheses of the theorem; without this, it is unclear whether the reported speedups occur under conditions where the convergence guarantee is valid.
minor comments (1)
- [Abstract and introduction] The abstract refers to 'mild assumptions' without naming them; a one-sentence enumeration of the key conditions (boundedness, summability) in the introduction would improve readability.
Simulated Author's Rebuttal
Thank you for the referee's insightful comments. We have carefully considered the points raised regarding the applicability of the convergence guarantees to the learned policies and the experimental protocol. Below we provide point-by-point responses and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Convergence theorem and learning procedure] The convergence theorem (stated in the abstract and presumably proved in the main theoretical section) requires mild assumptions including boundedness of the relaxation sequence and a summability condition such as ∑|α_{k+1}−α_k|<∞ (or an analogous condition on the penalty). The neural policy architecture and training procedure described for online learning do not include any explicit enforcement mechanism (e.g., projection, regularization, or constrained parameterization) to guarantee these properties hold for the output sequence, so the formal result does not automatically apply to the reported learned controllers.
Authors: We agree with the referee that the convergence theorem relies on the specified assumptions on the parameter sequences, and that the learning procedure as presented does not explicitly enforce them. This is a valid point. In the revised manuscript, we will incorporate a simple projection step onto a bounded interval for the relaxation parameter output by the policy, and add a regularization term in the training loss that penalizes large variations between consecutive predictions to promote the summability condition in practice. We will also clarify in the theoretical section that the guarantees apply to the projected policies. This addresses the concern directly without altering the core contribution. revision: yes
-
Referee: [Experimental results on quadratic programs] The empirical claims rest on benchmark comparisons showing gains over OSQP, but the experimental protocol does not detail how (or whether) the trained policies were constrained or post-processed to satisfy the summability/boundedness hypotheses of the theorem; without this, it is unclear whether the reported speedups occur under conditions where the convergence guarantee is valid.
Authors: We acknowledge that the current experimental section lacks explicit details on ensuring the assumptions hold for the learned policies. In the revision, we will expand the experimental protocol to describe the projection and regularization used during training and inference, and report statistics on the observed variation in the relaxation sequences to verify that the summability condition is approximately satisfied in the finite-horizon solves. This will make clear that the empirical results are obtained under conditions compatible with the theorem. revision: yes
Circularity Check
No significant circularity detected; theorem and learning remain independent.
full rationale
The paper presents a convergence theorem for time-varying ADMM under separate mild assumptions on the parameter sequences, followed by an independent learning procedure for relaxation policies and benchmark comparisons. No equations or claims reduce the theorem's hypotheses to the learned outputs by construction, nor do any self-citations or fitted inputs make the reported improvements equivalent to the inputs. The derivation chain is self-contained, with the formal result not redefined via the empirical policies.
Axiom & Free-Parameter Ledger
free parameters (1)
- learned policy parameters
axioms (1)
- domain assumption Mild assumptions on the sequences of time-varying penalty and relaxation parameters suffice for convergence of ADMM
Reference graph
Works this paper leans on
-
[1]
B. Stephen, P. Neal, C. Eric, P. Borja, and E. Jonathan, “Distributed op- timization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011
work page 2011
-
[2]
A general analysis of the convergence of ADMM,
R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” inInternational conference on machine learning. PMLR, 2015, pp. 343–352
work page 2015
-
[3]
OSQP: An operator splitting solver for quadratic programs,
B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020
work page 2020
-
[4]
On the global and linear convergence of the generalized alternating direction method of multipliers,
W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,”Journal of Scientific Computing, vol. 66, no. 3, pp. 889–916, 2016
work page 2016
-
[5]
Adaptive ADMM with spectral penalty parameter selection,
Z. Xu, M. Figueiredo, and T. Goldstein, “Adaptive ADMM with spectral penalty parameter selection,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 718–727
work page 2017
-
[6]
Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,
Z. Xu, M. A. Figueiredo, X. Yuan, C. Studer, and T. Goldstein, “Adaptive relaxed ADMM: Convergence theory and practical imple- mentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7389–7398
work page 2017
-
[7]
SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,
P. Verheijen, D. Goswami, and M. Lazar, “SuperADMM: Solving quadratic programs faster with dynamic weighting ADMM,” in2025 29th International Conference on System Theory, Control and Com- puting (ICSTCC). IEEE, 2025, pp. 569–575
work page 2025
-
[8]
Learning to learn by gradient descent by gradient descent,
M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,”Advances in neural information processing systems, vol. 29, 2016
work page 2016
-
[9]
Learning to optimize: A primer and a benchmark,
T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,”Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022
work page 2022
-
[10]
R. Sambharya and B. Stellato, “Learning algorithm hyperparam- eters for fast parametric convex optimization,”arXiv preprint arXiv:2411.15717, 2024
-
[11]
Data-driven performance guarantees for classical and learned optimizers,
——, “Data-driven performance guarantees for classical and learned optimizers,”Journal of Machine Learning Research, vol. 26, no. 171, pp. 1–49, 2025
work page 2025
-
[12]
Learning to warm-start fixed-point optimization algorithms,
R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm-start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024
work page 2024
-
[13]
Learning to optimize with convergence guarantees using nonlinear system theory,
A. Martin and L. Furieri, “Learning to optimize with convergence guarantees using nonlinear system theory,”IEEE Control Systems Letters, vol. 8, pp. 1355–1360, 2024
work page 2024
-
[14]
Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,
A. Martin, I. R. Manchester, and L. Furieri, “Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms,”arXiv preprint arXiv:2508.00775, 2025
-
[15]
Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,
A. Martin and G. Belgioioso, “Learning to accelerate Krasnosel’skii- Mann fixed-point iterations with guarantees,”arXiv preprint arXiv:2601.07665, 2026
-
[16]
Accelerating quadratic optimization with reinforcement learning,
J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Accelerating quadratic optimization with reinforcement learning,”Advances in Neural Infor- mation Processing Systems, vol. 34, pp. 21 043–21 055, 2021
work page 2021
-
[17]
Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,
P. Giselsson and S. Boyd, “Linear convergence and metric selection for Douglas-Rachford splitting and ADMM,”IEEE Transactions on Automatic Control, vol. 62, no. 2, pp. 532–544, 2016
work page 2016
-
[18]
J. Eckstein and D. P. Bertsekas, “On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators,”Mathematical programming, vol. 55, no. 1, pp. 293–318, 1992
work page 1992
-
[19]
D. A. Lorenz, J. Marquardt, and E. Naldi, “The degenerate variable metric proximal point algorithm and adaptive stepsizes for primal– dual Douglas–Rachford,”Optimization, vol. 74, no. 6, pp. 1355–1381, 2025
work page 2025
-
[20]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review arXiv 2017
-
[21]
A. Richards, “University of Oxford Advanced Research Computing Facility,” https://doi.org/10.5281/zenodo.22558, Aug. 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.