Reinforcement learning for adaptive interior point methods in convex quadratic programming

Alberto De Marchi; Jeremy Bertoncini; Matthias Gerdts; Simon Gottschalk

arxiv: 2509.07404 · v2 · pith:WYQPSJK3new · submitted 2025-09-09 · 🧮 math.OC · cs.LG

Reinforcement learning for adaptive interior point methods in convex quadratic programming

Jeremy Bertoncini , Alberto De Marchi , Matthias Gerdts , Simon Gottschalk This is my paper

Pith reviewed 2026-05-21 21:58 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords reinforcement learninginterior point methodsquadratic programmingparameter tuningconvex optimizationadaptive solversnumerical optimization

0 comments

The pith

Reinforcement learning automates parameter selection to speed convergence in regularized interior-point solvers for convex quadratic programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Quadratic programming solvers using regularized interior-point methods often converge slowly in later iterations and depend heavily on manual hyperparameter choices. The paper explores training a reinforcement learning policy to select control parameters dynamically within the solver's two-loop structure. This targets high-accuracy solutions while reducing overall iteration counts. Experiments indicate that a lightweight training phase produces a policy that generalizes across problem classes and dimensions without retraining.

Core claim

A reinforcement learning policy can learn to select effective regularization and step parameters for a regularized interior-point method solving convex quadratic programs. After training on a limited set of problems, the policy accelerates the solution process to high accuracy and maintains performance on new instances that differ in dimension and class.

What carries the argument

The reinforcement learning policy that selects regularization parameters and iteration controls inside the two-loop flow of the regularized interior-point algorithm.

Load-bearing premise

A policy trained on a limited collection of quadratic programs will continue to choose effective parameters on unseen problems whose dimension, conditioning, or sparsity pattern may differ substantially from the training distribution.

What would settle it

Applying the trained policy to quadratic programs with substantially different dimensions, conditioning, or sparsity patterns and observing that it requires as many or more iterations than a fixed-parameter baseline to reach the same accuracy.

Figures

Figures reproduced from arXiv: 2509.07404 by Alberto De Marchi, Jeremy Bertoncini, Matthias Gerdts, Simon Gottschalk.

**Figure 2.** Figure 2: Distribution of problem size (in terms of number of nonzero entries) and density for randomized [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Distributions of problem initial residuals (sample of 10 [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of relevant metrics during the multistage training procedure. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Validation set: Data (left) and performance (right) profiles comparing RL and solver configu [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Mappings αx (left), αy (middle) and αz (right): learned policy πζ evaluated with fixed ν = 10−2 and ε = 10−2 . 10−2 10−1 100 0 0.2 0.4 0.6 0.8 1 Runtime [s] Fraction of problems solved RL after stage 1 RL after stage 2 RL after stage 3 RL 2 0 2 1 2 2 2 3 2 4 2 5 2 6 Performance ratio: runtime [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: In-training validation: policy performance along the multistage process. Data (left) and [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Scale 10× set: Data (left) and performance (right) profiles comparing RL and solver configurations with fixed αx = 0.15, αy and αz. However, the choice of random problem sizes during training (see [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Maros–M´esz´aros set: Data (left) and performance (right) profiles comparing RL and solver [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Maros–M´esz´aros set: Data (left) and performance (right) profiles comparing the influence of [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

read the original abstract

Quadratic programming is a workhorse of modern nonlinear optimization, control, and data science. Although regularized methods offer convergence guarantees under minimal assumptions on the problem data, they can exhibit the slow tail-convergence typical of first-order schemes, thus requiring many iterations to achieve high-accuracy solutions. Moreover, hyperparameter tuning significantly impacts the solver performance but how to find an appropriate parameter configuration remains an elusive research question. To address these issues, we explore how data-driven approaches can accelerate the solution process. Aiming at high-accuracy solutions, we focus on a regularized interior-point solver and carefully handle its two-loop flow and control parameters. We will show that reinforcement learning can make a significant contribution to facilitating the solver tuning and to speeding up the optimization process. Numerical experiments demonstrate that, after a lightweight training, the learned policy generalizes well to different problem classes with varying dimensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a reinforcement learning framework to adaptively select regularization and step parameters within a regularized interior-point method for convex quadratic programming. The approach targets the slow tail convergence of regularized IPMs and the difficulty of manual hyperparameter tuning by training a policy on the solver's two-loop structure. The central claim is that a lightweight training phase yields a policy that generalizes to unseen problem classes with varying dimensions, producing measurable speed-ups, as supported by numerical experiments.

Significance. If the empirical results prove robust under proper controls, the work could provide a practical route to automated solver tuning for convex QPs, with relevance to control, data science, and nonlinear optimization pipelines. The emphasis on high-accuracy solutions and handling of the two-loop flow distinguishes it from generic learned-optimizer literature and could reduce iteration counts without sacrificing convergence guarantees.

major comments (2)

[Numerical Experiments] Numerical Experiments section: the abstract asserts that numerical experiments support generalization and speed-up, but supplies no information on baselines, statistical significance, problem generation, or failure cases. Without these details the central empirical claim cannot be verified.
[Numerical Experiments] State representation and training distribution (implicit in the generalization discussion): the claim that the policy 'generalizes well to different problem classes with varying dimensions' requires that the state vector normalizes for problem size or uses scale-invariant features and that the training set covers a representative range of conditioning numbers and sparsity patterns. No such analysis or description is provided, leaving the transfer claim unsupported.

minor comments (1)

[Abstract] The abstract would benefit from naming the specific RL algorithm (e.g., PPO, DQN) and the precise interior-point variant employed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. We address each major comment below, agreeing that the Numerical Experiments section requires expansion to better support the central claims. We will incorporate the suggested details in the revised manuscript.

read point-by-point responses

Referee: Numerical Experiments section: the abstract asserts that numerical experiments support generalization and speed-up, but supplies no information on baselines, statistical significance, problem generation, or failure cases. Without these details the central empirical claim cannot be verified.

Authors: We agree that additional details are necessary for verification. In the revised manuscript we will expand the Numerical Experiments section to describe the baselines (standard regularized IPM with fixed parameters and established QP solvers), report statistical significance via repeated trials with means, standard deviations, and appropriate hypothesis tests, specify the problem generation process (including dimension ranges, conditioning, and sparsity), and discuss any observed failure cases or non-convergent instances. These additions will directly address the referee's concerns without altering the existing results. revision: yes
Referee: State representation and training distribution (implicit in the generalization discussion): the claim that the policy 'generalizes well to different problem classes with varying dimensions' requires that the state vector normalizes for problem size or uses scale-invariant features and that the training set covers a representative range of conditioning numbers and sparsity patterns. No such analysis or description is provided, leaving the transfer claim unsupported.

Authors: We acknowledge the manuscript currently lacks explicit discussion of scale-invariance in the state representation and coverage statistics for the training distribution. We will add a dedicated paragraph describing the state vector components, any normalization steps (such as relative residuals or logarithmic scaling to mitigate dimension dependence), and quantitative characteristics of the training problems, including ranges of condition numbers and sparsity levels across the problem classes used. This will strengthen the justification for generalization to unseen instances with varying dimensions. revision: yes

Circularity Check

0 steps flagged

No circularity: standard RL train-test setup with empirical generalization testing

full rationale

The paper presents a reinforcement learning approach to tune parameters of a regularized interior-point solver for quadratic programs. It trains a policy on a collection of problems and evaluates performance on unseen problem classes with varying dimensions. This follows the standard supervised/RL train-test paradigm, where success on held-out instances is measured empirically rather than derived by algebraic identity or self-referential fitting. No equations, uniqueness theorems, or ansatzes are shown that reduce the claimed generalization to the training inputs by construction. The approach is self-contained against external benchmarks via numerical experiments on different problem classes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; therefore the ledger records only the background assumptions explicitly stated in the abstract.

axioms (1)

domain assumption Regularized interior-point methods offer convergence guarantees under minimal assumptions on the problem data.
Stated directly in the abstract as the starting point for the work.

pith-pipeline@v0.9.0 · 5686 in / 1217 out tokens · 26829 ms · 2026-05-21T21:58:16.953380+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate a RL framework to learn a policy for effective and reliable hyperparameter tuning, with the QP solver retaining provable convergence guarantees. ... the action α affects the parameters δ of the solver
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Numerical experiments demonstrate that, after a lightweight training, the learned policy generalizes well to different problem classes with varying dimensions.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

[1]

B. Amos. Tutorial on amortized optimization, 2023, 2202.00665

work page arXiv 2023
[2]

Applegate, M

D. Applegate, M. Diaz, O. Hinder, H. Lu, M. Lubin, B. O’Donoghue, and W. Schudy. Practical large- scale linear programming using primal-dual hybrid gradient. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 20243–20257. Curran Associates, Inc., 2021. URLhttp...

work page 2021
[3]

Audet, K.-C

C. Audet, K.-C. Dang, and D. Orban. Optimization of algorithms with OPAL.Mathematical Programming Computation, 6(3):233–254, 2014. doi:10.1007/s12532-014-0067-x

work page doi:10.1007/s12532-014-0067-x 2014
[4]

Bambade, F

A. Bambade, F. Schramm, S. El-Kazdadi, S. Caron, A. Taylor, and J. Carpentier. ProxQP: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond.IEEE Transactions on Robotics, pages 1–19, 2025. doi:10.1109/TRO.2025.3577107

work page doi:10.1109/tro.2025.3577107 2025
[5]

Bemporad

A. Bemporad. A numerically stable solver for positive semidefinite quadratic programs based on nonnegative least squares.IEEE Transactions on Automatic Control, 63(2):525–531, 2018. doi:10.1109/TAC.2017.2735938

work page doi:10.1109/tac.2017.2735938 2018
[6]

E. G. Birgin and J. M. Mart´ ınez. Augmented Lagrangian method with nonmonotone penalty param- eters for constrained optimization.Computational Optimization and Applications, 51(3):941–965, 2012. doi:10.1007/s10589-011-9396-0

work page doi:10.1007/s10589-011-9396-0 2012
[7]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends®in Machine Learning, 3(1):1–122,

work page
[8]

doi:10.1561/2200000016. 18

work page doi:10.1561/2200000016
[9]

R. Byrd, J. Nocedal, and R. Waltz. Feasible interior methods using slacks for nonlinear optimization. Computational Optimization and Applications, 26, 03 2002. doi:10.1023/A:1025136421370

work page doi:10.1023/a:1025136421370 2002
[10]

Cipolla and J

S. Cipolla and J. Gondzio. Proximal stabilized interior point methods and low-frequency-update pre- conditioning techniques.Journal of Optimization Theory and Applications, 197(3):1061–1103, 2023. doi:10.1007/s10957-023-02194-4

work page doi:10.1007/s10957-023-02194-4 2023
[11]

De Marchi

A. De Marchi. On a primal-dual Newton proximal method for convex quadratic programs.Computational Optimization and Applications, 81(2):369–395, 3 2022. doi:10.1007/s10589-021-00342-y

work page doi:10.1007/s10589-021-00342-y 2022
[12]

De Marchi

A. De Marchi. Regularized interior point methods for constrained optimization and control.IFAC- PapersOnLine, 56(2):1247–1252, 2023. doi:10.1016/j.ifacol.2023.10.1747

work page doi:10.1016/j.ifacol.2023.10.1747 2023
[13]

d’Aspremont, D

A. d’Aspremont, D. Scieur, and A. Taylor. Acceleration methods.Foundations and Trends®in Optimiza- tion, 5(1–2):1–245, 2021. doi:10.1561/2400000036

work page doi:10.1561/2400000036 2021
[14]

E. A. Feinberg and A. Shwartz, editors.Handbook of Markov Decision Processes: Methods and Applications, volume 40 ofInternational Series in Operations Research & Management Science. Springer, Boston, MA,

work page
[15]

doi:10.1007/978-1-4615-0805-2

work page doi:10.1007/978-1-4615-0805-2
[16]

H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: a parametric active- set algorithm for quadratic programming.Mathematical Programming Computation, 6(4):327–363, 2014. doi:10.1007/s12532-014-0071-1

work page doi:10.1007/s12532-014-0071-1 2014
[17]

Ghannad, D

A. Ghannad, D. Orban, and M. A. Saunders. Linear systems arising in interior methods for convex opti- mization: a symmetric formulation with bounded condition number.Optimization Methods and Software, 37(4):1344–1369, 2022. doi:10.1080/10556788.2021.1965599

work page doi:10.1080/10556788.2021.1965599 2022
[18]

P. Gill, W. Murray, and M. Wright.Practical Optimization. Academic Press, 1981. URLhttps://books. google.de/books?id=xUzvAAAAMAAJ

work page 1981
[19]

P. E. Gill and D. P. Robinson. A primal-dual augmented Lagrangian.Computational Optimization and Applications, 51(1):1–25, 1 2012. doi:10.1007/s10589-010-9339-1

work page doi:10.1007/s10589-010-9339-1 2012
[20]

Giselsson and S

P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas–Rachford splitting and ADMM.IEEE Transactions on Automatic Control, 62(2):532–544, 2017. doi:10.1109/TAC.2016.2564160

work page doi:10.1109/tac.2016.2564160 2017
[21]

Goujaud, C

B. Goujaud, C. Moucer, F. Glineur, J. M. Hendrickx, A. B. Taylor, and A. Dieuleveut. Pepit: computer- assisted worst-case analyses of first-order optimization methods in python.Mathematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-00259-7

work page doi:10.1007/s12532-024-00259-7 2024
[22]

Grippo, F

L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for newton’s method.SIAM Journal on Numerical Analysis, 23(4):707–716, 1986. doi:10.1137/0723046

work page doi:10.1137/0723046 1986
[24]

Ichnowski, P

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Gold- berg. Accelerating quadratic optimization with reinforcement learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 21043–21055. Curran Associates...

work page 2021
[25]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1):99–134, 1998. doi:https://doi.org/10.1016/S0004-3702(98)00023-X

work page doi:10.1016/s0004-3702(98)00023-x 1998
[26]

Learning to Optimize

K. Li and J. Malik. Learning to optimize. InInternational Conference on Learning Representations, 2017, 1606.01885. URLhttps://openreview.net/forum?id=ry4Vrt5gl

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

RLlib: Abstractions for Distributed Reinforcement Learning

E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. E. Gonzalez, M. I. Jordan, and I. Stoica. RLlib: Abstractions for distributed reinforcement learning. In J. Dy and A. Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 3053–3062. PMLR, 2018. UR...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Liao-McPherson and I

D. Liao-McPherson and I. Kolmanovsky. FBstab: A proximally stabilized semismooth algorithm for convex quadratic programming.Automatica, 113:108801, 2020. doi:10.1016/j.automatica.2019.108801

work page doi:10.1016/j.automatica.2019.108801 2020
[29]

F. J. Luque. Asymptotic convergence analysis of the proximal point algorithm.SIAM Journal on Control and Optimization, 22(2):277–293, 1984. doi:10.1137/0322019

work page doi:10.1137/0322019 1984
[30]

Maros and C

I. Maros and C. M´ esz´ aros. A repository of convex quadratic programming problems.Optimization Methods and Software, 11(1-4):671–681, 1999. doi:10.1080/10556789908805768. 19

work page doi:10.1080/10556789908805768 1999
[31]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms.Foundations and Trends®in Optimization, 1(3):127–239,

work page
[32]

doi:10.1561/2400000003

work page doi:10.1561/2400000003
[33]

Pougkakiotis and J

S. Pougkakiotis and J. Gondzio. An interior point-proximal method of multipliers for convex quadratic programming.Computational Optimization and Applications, 78(2):307–351, 2021. doi:10.1007/s10589-020- 00240-9

work page doi:10.1007/s10589-020- 2021
[34]

R. T. Rockafellar. Augmented Lagrangians and applications of the proximal point algorithm in convex programming.Mathematics of operations research, 1(2):97–116, 1976. doi:10.1287/moor.1.2.97

work page doi:10.1287/moor.1.2.97 1976
[35]

R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976. doi:10.1137/0314056

work page doi:10.1137/0314056 1976
[36]

Sambharya, G

R. Sambharya, G. Hall, B. Amos, and B. Stellato. End-to-end learning to warm-start for real-time quadratic optimization. In N. Matni, M. Morari, and G. J. Pappas, editors,Proceedings of the 5th Annual Learning for Dynamics and Control Conference, volume 211 ofProceedings of Machine Learning Research, pages 220–234. PMLR, 6 2023. URLhttps://proceedings.mlr...

work page 2023
[37]

Sambharya, G

R. Sambharya, G. Hall, B. Amos, and B. Stellato. Learning to warm-start fixed-point optimization algo- rithms.Journal of Machine Learning Research, 25(166):1–46, 2024. URLhttp://jmlr.org/papers/v25/ 23-1174.html

work page 2024
[38]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017, 1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Schwan, Y

R. Schwan, Y. Jiang, D. Kuhn, and C. N. Jones. PIQP: A proximal interior-point quadratic program- ming solver. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 1088–1093, 2023. doi:10.1109/CDC49753.2023.10383915

work page doi:10.1109/cdc49753.2023.10383915 2023
[40]

Stellato, G

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP: an operator splitting solver for quadratic programs.Mathematical Programming Computation, 12(4):637–672, 2020. doi:10.1007/s12532- 020-00179-2

work page doi:10.1007/s12532- 2020
[41]

R. S. Sutton and A. G. Barto.Reinforcement Learning: An Introduction. The MIT Press, second edition,

work page
[42]

URLhttp://incompleteideas.net/book/the-book-2nd.html

work page
[43]

W¨ achter and L

A. W¨ achter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming, 106(1):25–57, 3 2006. doi:10.1007/s10107- 004-0559-y

work page doi:10.1007/s10107- 2006
[44]

K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, H. Huang, and C.-B. Sch¨ onlieb. TFPnP: Tuning-free plug- and-play proximal algorithms with applications to inverse imaging problems.Journal of Machine Learning Research, 23(16):1–48, 2022. URLhttp://jmlr.org/papers/v23/20-1297.html

work page 2022
[45]

Z. Wu, E. Liang, M. Luo, S. Mika, J. E. Gonzalez, and I. Stoica. RLlib flow: Distributed reinforcement learn- ing is a dataflow problem. InConference on Neural Information Processing Systems (NeurIPS), 2021. URL https://proceedings.neurips.cc/paper/2021/file/2bce32ed409f5ebcee2a7b417ad9beed-Paper.pdf. 20

work page 2021

[1] [1]

B. Amos. Tutorial on amortized optimization, 2023, 2202.00665

work page arXiv 2023

[2] [2]

Applegate, M

D. Applegate, M. Diaz, O. Hinder, H. Lu, M. Lubin, B. O’Donoghue, and W. Schudy. Practical large- scale linear programming using primal-dual hybrid gradient. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 20243–20257. Curran Associates, Inc., 2021. URLhttp...

work page 2021

[3] [3]

Audet, K.-C

C. Audet, K.-C. Dang, and D. Orban. Optimization of algorithms with OPAL.Mathematical Programming Computation, 6(3):233–254, 2014. doi:10.1007/s12532-014-0067-x

work page doi:10.1007/s12532-014-0067-x 2014

[4] [4]

Bambade, F

A. Bambade, F. Schramm, S. El-Kazdadi, S. Caron, A. Taylor, and J. Carpentier. ProxQP: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond.IEEE Transactions on Robotics, pages 1–19, 2025. doi:10.1109/TRO.2025.3577107

work page doi:10.1109/tro.2025.3577107 2025

[5] [5]

Bemporad

A. Bemporad. A numerically stable solver for positive semidefinite quadratic programs based on nonnegative least squares.IEEE Transactions on Automatic Control, 63(2):525–531, 2018. doi:10.1109/TAC.2017.2735938

work page doi:10.1109/tac.2017.2735938 2018

[6] [6]

E. G. Birgin and J. M. Mart´ ınez. Augmented Lagrangian method with nonmonotone penalty param- eters for constrained optimization.Computational Optimization and Applications, 51(3):941–965, 2012. doi:10.1007/s10589-011-9396-0

work page doi:10.1007/s10589-011-9396-0 2012

[7] [7]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends®in Machine Learning, 3(1):1–122,

work page

[8] [8]

doi:10.1561/2200000016. 18

work page doi:10.1561/2200000016

[9] [9]

R. Byrd, J. Nocedal, and R. Waltz. Feasible interior methods using slacks for nonlinear optimization. Computational Optimization and Applications, 26, 03 2002. doi:10.1023/A:1025136421370

work page doi:10.1023/a:1025136421370 2002

[10] [10]

Cipolla and J

S. Cipolla and J. Gondzio. Proximal stabilized interior point methods and low-frequency-update pre- conditioning techniques.Journal of Optimization Theory and Applications, 197(3):1061–1103, 2023. doi:10.1007/s10957-023-02194-4

work page doi:10.1007/s10957-023-02194-4 2023

[11] [11]

De Marchi

A. De Marchi. On a primal-dual Newton proximal method for convex quadratic programs.Computational Optimization and Applications, 81(2):369–395, 3 2022. doi:10.1007/s10589-021-00342-y

work page doi:10.1007/s10589-021-00342-y 2022

[12] [12]

De Marchi

A. De Marchi. Regularized interior point methods for constrained optimization and control.IFAC- PapersOnLine, 56(2):1247–1252, 2023. doi:10.1016/j.ifacol.2023.10.1747

work page doi:10.1016/j.ifacol.2023.10.1747 2023

[13] [13]

d’Aspremont, D

A. d’Aspremont, D. Scieur, and A. Taylor. Acceleration methods.Foundations and Trends®in Optimiza- tion, 5(1–2):1–245, 2021. doi:10.1561/2400000036

work page doi:10.1561/2400000036 2021

[14] [14]

E. A. Feinberg and A. Shwartz, editors.Handbook of Markov Decision Processes: Methods and Applications, volume 40 ofInternational Series in Operations Research & Management Science. Springer, Boston, MA,

work page

[15] [15]

doi:10.1007/978-1-4615-0805-2

work page doi:10.1007/978-1-4615-0805-2

[16] [16]

H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: a parametric active- set algorithm for quadratic programming.Mathematical Programming Computation, 6(4):327–363, 2014. doi:10.1007/s12532-014-0071-1

work page doi:10.1007/s12532-014-0071-1 2014

[17] [17]

Ghannad, D

A. Ghannad, D. Orban, and M. A. Saunders. Linear systems arising in interior methods for convex opti- mization: a symmetric formulation with bounded condition number.Optimization Methods and Software, 37(4):1344–1369, 2022. doi:10.1080/10556788.2021.1965599

work page doi:10.1080/10556788.2021.1965599 2022

[18] [18]

P. Gill, W. Murray, and M. Wright.Practical Optimization. Academic Press, 1981. URLhttps://books. google.de/books?id=xUzvAAAAMAAJ

work page 1981

[19] [19]

P. E. Gill and D. P. Robinson. A primal-dual augmented Lagrangian.Computational Optimization and Applications, 51(1):1–25, 1 2012. doi:10.1007/s10589-010-9339-1

work page doi:10.1007/s10589-010-9339-1 2012

[20] [20]

Giselsson and S

P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas–Rachford splitting and ADMM.IEEE Transactions on Automatic Control, 62(2):532–544, 2017. doi:10.1109/TAC.2016.2564160

work page doi:10.1109/tac.2016.2564160 2017

[21] [21]

Goujaud, C

B. Goujaud, C. Moucer, F. Glineur, J. M. Hendrickx, A. B. Taylor, and A. Dieuleveut. Pepit: computer- assisted worst-case analyses of first-order optimization methods in python.Mathematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-00259-7

work page doi:10.1007/s12532-024-00259-7 2024

[22] [22]

Grippo, F

L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for newton’s method.SIAM Journal on Numerical Analysis, 23(4):707–716, 1986. doi:10.1137/0723046

work page doi:10.1137/0723046 1986

[23] [24]

Ichnowski, P

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Gold- berg. Accelerating quadratic optimization with reinforcement learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 21043–21055. Curran Associates...

work page 2021

[24] [25]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1):99–134, 1998. doi:https://doi.org/10.1016/S0004-3702(98)00023-X

work page doi:10.1016/s0004-3702(98)00023-x 1998

[25] [26]

Learning to Optimize

K. Li and J. Malik. Learning to optimize. InInternational Conference on Learning Representations, 2017, 1606.01885. URLhttps://openreview.net/forum?id=ry4Vrt5gl

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [27]

RLlib: Abstractions for Distributed Reinforcement Learning

E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. E. Gonzalez, M. I. Jordan, and I. Stoica. RLlib: Abstractions for distributed reinforcement learning. In J. Dy and A. Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 3053–3062. PMLR, 2018. UR...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [28]

Liao-McPherson and I

D. Liao-McPherson and I. Kolmanovsky. FBstab: A proximally stabilized semismooth algorithm for convex quadratic programming.Automatica, 113:108801, 2020. doi:10.1016/j.automatica.2019.108801

work page doi:10.1016/j.automatica.2019.108801 2020

[28] [29]

F. J. Luque. Asymptotic convergence analysis of the proximal point algorithm.SIAM Journal on Control and Optimization, 22(2):277–293, 1984. doi:10.1137/0322019

work page doi:10.1137/0322019 1984

[29] [30]

Maros and C

I. Maros and C. M´ esz´ aros. A repository of convex quadratic programming problems.Optimization Methods and Software, 11(1-4):671–681, 1999. doi:10.1080/10556789908805768. 19

work page doi:10.1080/10556789908805768 1999

[30] [31]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms.Foundations and Trends®in Optimization, 1(3):127–239,

work page

[31] [32]

doi:10.1561/2400000003

work page doi:10.1561/2400000003

[32] [33]

Pougkakiotis and J

S. Pougkakiotis and J. Gondzio. An interior point-proximal method of multipliers for convex quadratic programming.Computational Optimization and Applications, 78(2):307–351, 2021. doi:10.1007/s10589-020- 00240-9

work page doi:10.1007/s10589-020- 2021

[33] [34]

R. T. Rockafellar. Augmented Lagrangians and applications of the proximal point algorithm in convex programming.Mathematics of operations research, 1(2):97–116, 1976. doi:10.1287/moor.1.2.97

work page doi:10.1287/moor.1.2.97 1976

[34] [35]

R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976. doi:10.1137/0314056

work page doi:10.1137/0314056 1976

[35] [36]

Sambharya, G

R. Sambharya, G. Hall, B. Amos, and B. Stellato. End-to-end learning to warm-start for real-time quadratic optimization. In N. Matni, M. Morari, and G. J. Pappas, editors,Proceedings of the 5th Annual Learning for Dynamics and Control Conference, volume 211 ofProceedings of Machine Learning Research, pages 220–234. PMLR, 6 2023. URLhttps://proceedings.mlr...

work page 2023

[36] [37]

Sambharya, G

R. Sambharya, G. Hall, B. Amos, and B. Stellato. Learning to warm-start fixed-point optimization algo- rithms.Journal of Machine Learning Research, 25(166):1–46, 2024. URLhttp://jmlr.org/papers/v25/ 23-1174.html

work page 2024

[37] [38]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017, 1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017

[38] [39]

Schwan, Y

R. Schwan, Y. Jiang, D. Kuhn, and C. N. Jones. PIQP: A proximal interior-point quadratic program- ming solver. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 1088–1093, 2023. doi:10.1109/CDC49753.2023.10383915

work page doi:10.1109/cdc49753.2023.10383915 2023

[39] [40]

Stellato, G

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP: an operator splitting solver for quadratic programs.Mathematical Programming Computation, 12(4):637–672, 2020. doi:10.1007/s12532- 020-00179-2

work page doi:10.1007/s12532- 2020

[40] [41]

R. S. Sutton and A. G. Barto.Reinforcement Learning: An Introduction. The MIT Press, second edition,

work page

[41] [42]

URLhttp://incompleteideas.net/book/the-book-2nd.html

work page

[42] [43]

W¨ achter and L

A. W¨ achter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming, 106(1):25–57, 3 2006. doi:10.1007/s10107- 004-0559-y

work page doi:10.1007/s10107- 2006

[43] [44]

K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, H. Huang, and C.-B. Sch¨ onlieb. TFPnP: Tuning-free plug- and-play proximal algorithms with applications to inverse imaging problems.Journal of Machine Learning Research, 23(16):1–48, 2022. URLhttp://jmlr.org/papers/v23/20-1297.html

work page 2022

[44] [45]

Z. Wu, E. Liang, M. Luo, S. Mika, J. E. Gonzalez, and I. Stoica. RLlib flow: Distributed reinforcement learn- ing is a dataflow problem. InConference on Neural Information Processing Systems (NeurIPS), 2021. URL https://proceedings.neurips.cc/paper/2021/file/2bce32ed409f5ebcee2a7b417ad9beed-Paper.pdf. 20

work page 2021