Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems

David Cole; Fran\c{c}ois Pacaud; Mihai Anitescu; Sungho Shin; Victor M. Zavala

arxiv: 2209.13049 · v1 · submitted 2022-09-26 · 🧮 math.OC

Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems

David Cole , Sungho Shin , Fran\c{c}ois Pacaud , Victor M. Zavala , Mihai Anitescu This is my paper

Pith reviewed 2026-05-24 11:25 UTC · model grok-4.3

classification 🧮 math.OC

keywords model predictive controlGPU computinginterior-point methodscondensed formulationPDE-constrained optimizationparallel factorizationlinear-quadratic problems

0 comments

The pith

Condensing linear-quadratic MPC to a positive definite matrix sized only by controls enables order-of-magnitude faster GPU solves than CPU.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that eliminating state variables from a constrained linear-quadratic MPC problem, followed by a condensed-space interior-point method to remove inequalities from the KKT system, produces a positive definite matrix whose size depends only on the number of controls. This structure permits stable parallel factorization on GPU and SIMD architectures, which is especially effective for problems with many states, few inputs, and moderate horizons such as PDE-constrained control. A reader would care because the approach demonstrates a concrete route to real-time MPC on widely available hardware without custom low-level coding. Numerical tests confirm an order of magnitude speedup over standard CPU implementations. The work also supplies open-source Julia packages for modeling and GPU solution of such problems.

Core claim

By eliminating the state variables and applying a condensed-space interior-point method, the resulting matrix after inequality removal is positive definite, depends in size only on the number of controls, and admits efficient parallel factorization on GPU/SIMD hardware, delivering an order of magnitude faster solution than a standard CPU implementation for PDE-constrained linear-quadratic MPC problems.

What carries the argument

The condensed positive definite matrix after state elimination and inequality removal in the interior-point method, whose dimension depends solely on the number of controls.

If this is right

Computational effort scales with the number of controls rather than the number of states.
Parallel factorization on GPU hardware becomes the dominant cost only through the control dimension and horizon length.
The method applies directly to large-scale PDE-constrained optimal control problems with moderate horizons.
Open-source Julia modeling and solver packages enable direct GPU deployment for such MPC instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same state-elimination step could be combined with other parallel linear-algebra routines beyond interior-point methods.
If positive definiteness holds under modest perturbations, the technique may extend to mildly nonlinear MPC formulations.
Benchmarking the released Julia packages against other GPU-accelerated solvers would quantify the practical advantage for end users.

Load-bearing premise

The condensed matrix remains positive definite after state elimination and inequality removal, allowing stable parallel factorization.

What would settle it

A linear-quadratic MPC instance where the condensed matrix loses positive definiteness or a timing benchmark where the GPU implementation fails to achieve the reported order-of-magnitude speedup over CPU.

Figures

Figures reproduced from arXiv: 2209.13049 by David Cole, Fran\c{c}ois Pacaud, Mihai Anitescu, Sungho Shin, Victor M. Zavala.

**Figure 2.** Figure 2: Total solver time for T = 250. TABLE II SOLVER STATISTICS FOR T = 250 AND VARIED N. CPU Sparse CPU Dense GPU Dense N iter tot lin iter tot lin iter tot lin 4 26 3.0 2.6 28 31.0 1.10 28 1.68 0.037 5 30 27.9 24.8 33 60.1 1.30 33 3.54 0.045 6 29 91.2 81.4 31 102.0 1.25 31 6.59 0.041 7 32 412.3 372.0 38 192.3 1.55 38 11.6 0.055 8 31 7086. 6617. 37 685.4 1.66 37 15.4 0.051 9 - - - 38 476.8 1.77 38 22.3 0.056 10… view at source ↗

**Figure 3.** Figure 3: Total solver time for N = 4 and varied T. TABLE V SOLVER STATISTICS FOR N = 4 AND VARIED T . CPU Sparse CPU Dense GPU Dense T iter tot lin iter tot lin iter tot lin 10 26 0.10 0.079 28 0.02 0.001 28 0.03 0.004 20 25 0.21 0.170 27 0.05 0.002 27 0.04 0.005 30 24 0.30 0.249 26 0.10 0.005 26 0.05 0.005 40 25 0.43 0.356 27 0.21 0.009 27 0.06 0.006 50 26 0.55 0.460 27 0.34 0.016 27 0.08 0.007 75 27 0.85 0.724 29… view at source ↗

read the original abstract

We report numerical results on solving constrained linear-quadratic model predictive control (MPC) problems by exploiting graphics processing units (GPUs). The presented method reduces the MPC problem by eliminating the state variables and applies a condensed-space interior-point method to remove the inequality constraints in the KKT system. The final condensed matrix is positive definite and can be efficiently factorized in parallel on GPU/SIMD architectures. In addition, the size of the condensed matrix depends only on the number of controls in the problem, rendering the method particularly effective when the problem has many states but few inputs and moderate horizon length. Our numerical results for PDE-constrained problems show that the approach is an order of magnitude faster than a standard CPU implementation. We also provide an open-source Julia framework that facilitates modeling (DynamicNLPModels.jl) and solution (MadNLP.jl) of MPC problems on GPUs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical GPU condensation plus condensed IPM pipeline for LQ-MPC with open-source Julia code, but the speedup evidence is too thin on details and the positive-definiteness claim lacks derivation.

read the letter

The core of the paper is a condensation step that removes states from the LQ-MPC problem, followed by a condensed-space interior-point method that leaves a positive definite matrix whose dimension depends only on the number of controls. This matrix is then factored in parallel on GPU or SIMD hardware. The authors supply open-source Julia packages (DynamicNLPModels.jl and MadNLP.jl) and report results on PDE-constrained examples that they say run an order of magnitude faster than a standard CPU implementation.

Referee Report

2 major / 0 minor

Summary. The paper proposes solving constrained linear-quadratic MPC problems by eliminating state variables to condense the problem, followed by a condensed-space interior-point method that removes inequalities from the KKT system. The resulting condensed matrix is asserted to be positive definite with size depending only on the number of controls, enabling efficient parallel factorization on GPU/SIMD hardware. Numerical experiments on PDE-constrained problems report an order-of-magnitude speedup versus a standard CPU implementation, and an open-source Julia framework (DynamicNLPModels.jl and MadNLP.jl) is provided.

Significance. If the positive-definiteness claim holds under the stated conditions and the performance results prove robust, the approach would offer a practical route to real-time solution of high-dimensional MPC problems (many states, few inputs, moderate horizons) on GPU hardware. The open-source modeling and solver framework strengthens reproducibility and adoption.

major comments (2)

[Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.
[Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of the positive-definiteness property and the numerical results.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.

Authors: We agree that the abstract would benefit from additional context on this key property. The manuscript constructs the condensed Hessian via state elimination in Section 3 and describes the modified KKT system after inequality removal in Section 4. To make the positive-definiteness claim fully rigorous, we will add a concise proof (or explicit conditions, such as positive-definite stage costs) in a new subsection or appendix showing inheritance from the original LQ objective. This will directly support the stability of the parallel factorization. revision: yes
Referee: [Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.

Authors: We acknowledge that the numerical results section would be strengthened by more complete reporting. In the revision we will include error bars from repeated timings, explicit descriptions of the PDE test problems (including discretization parameters such as spatial grids and horizon lengths), and direct comparisons against tuned CPU baselines that employ optimized linear-algebra libraries. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained standard condensing + IPM with independent numerical validation

full rationale

The paper presents a standard MPC condensing approach (state elimination) followed by a condensed-space interior-point method, asserts positive definiteness of the resulting matrix (a property inherited from the LQ cost under typical assumptions), and reports empirical GPU speedups on PDE-constrained instances. No step reduces a claimed prediction or first-principles result to a fitted parameter, self-citation, or definitional tautology. The speedup is measured numerically rather than derived from the definiteness claim itself, and the method description does not rely on load-bearing self-citations or ansatzes smuggled from prior work by the same authors. This is the common case of an independent algorithmic description validated externally by timings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard linear-algebra facts (positive-definiteness after condensation) and the structural assumption that the number of inputs is small relative to states; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The condensed KKT matrix after state elimination and inequality removal is positive definite.
Invoked to guarantee stable Cholesky factorization on GPU (abstract method paragraph).

pith-pipeline@v0.9.0 · 5695 in / 1260 out tokens · 30293 ms · 2026-05-24T11:25:23.354163+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Partitioned dynamic programming for optimal control,

S. J. Wright, “Partitioned dynamic programming for optimal control,” SIAM Journal on Optimization , vol. 1, no. 4, pp. 620–642, 1991

work page 1991
[2]

Application of interior- point methods to model predictive control,

C. V . Rao, S. J. Wright, and J. B. Rawlings, “Application of interior- point methods to model predictive control,” Journal of Optimization Theory and Applications , vol. 99, no. 3, pp. 723–757, 1998

work page 1998
[3]

High- performance small-scale solvers for linear model predictive control,

G. Frison, H. H. B. Sørensen, B. Dammann, and J. B. Jørgensen, “High- performance small-scale solvers for linear model predictive control,” in 2014 European Control Conference (ECC) . IEEE, 2014, pp. 128–133

work page 2014
[4]

GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,

S. Lee, H. Lee, Y . Kim, J. Kim, and W. Choi, “GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,” Sensors, vol. 22, no. 12, p. 4512, 2022

work page 2022
[5]

A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,

J. A. Andersson, J. V . Frasch, M. Vukov, and M. Diehl, “A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,” Automatica, pp. 97–100, 2013

work page 2013
[6]

A sparse and condensed QP formulation for predictive control of LTI systems,

J. L. Jerez, E. C. Kerrigan, and G. A. Constantinides, “A sparse and condensed QP formulation for predictive control of LTI systems,” Automatica, vol. 48, no. 5, pp. 999–1002, 2012

work page 2012
[7]

On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,

S. Na, S. Shin, M. Anitescu, and V . M. Zavala, “On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,” IEEE Transactions on Automatic Control , 2022

work page 2022
[8]

A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,

R. B. Schnabel, “A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,” Parallel Computing, vol. 21, no. 6, pp. 875–905, 1995

work page 1995
[9]

Solving optimization problems on computational grids,

S. J. Wright, “Solving optimization problems on computational grids,” 2001

work page 2001
[10]

Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,

N. Chiang, C. G. Petra, and V . M. Zavala, “Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,” in 2014 Power Systems Computation Conference . IEEE, 2014, pp. 1–7

work page 2014
[11]

An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,

K. Kim, C. G. Petra, and V . M. Zavala, “An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,” SIAM Journal on Optimization , vol. 29, no. 1, pp. 318–342, 2019

work page 2019
[12]

Comparing energy efﬁciency of CPU, GPU and FPGA implementations for vision kernels,

M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. H. Jones, “Comparing energy efﬁciency of CPU, GPU and FPGA implementations for vision kernels,” in 2019 IEEE International Conference on Embedded Software and Systems (ICESS) , 2019, pp. 1–8

work page 2019
[13]

A step towards energy efﬁcient computing: Redesigning a hydrodynamic application on CPU-GPU,

T. Dong, V . Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “A step towards energy efﬁcient computing: Redesigning a hydrodynamic application on CPU-GPU,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium , 2014, pp. 972–981

work page 2014
[14]

Efﬁcient calibration of embedded MPC,

M. Forgione, D. Piga, and A. Bemporad, “Efﬁcient calibration of embedded MPC,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5189–5194, 2020

work page 2020
[15]

Model predictive control for autonomous navigation using embedded graphics processing unit,

D.-K. Phung, B. Hérissé, J. Marzat, and S. Bertrand, “Model predictive control for autonomous navigation using embedded graphics processing unit,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 11 883–11 888, 2017

work page 2017
[16]

GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,

K. M. M. Rathai, M. Alamir, and O. Sename, “GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 14 369–14 374, 2020

work page 2020
[17]

Efﬁcient convex optimization on GPUs for embedded model predictive control,

L. Yu, A. Goldsmith, and S. Di Cairano, “Efﬁcient convex optimization on GPUs for embedded model predictive control,” in Proceedings of the General Purpose GPUs . New York, NY , USA: Association for Computing Machinery, 2017, pp. 12–21. [Online]. Available: https://doi.org/10.1145/3038228.3038234

work page doi:10.1145/3038228.3038234 2017
[18]

A collection of Fortran codes for large scale scientiﬁc computation

HSL, “A collection of Fortran codes for large scale scientiﬁc computation.” [Online]. Available: http://www.hsl.rl.ac.uk/

work page
[19]

Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,

O. Schenk, A. Wächter, and M. Weiser, “Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,” SIAM Journal on Scientiﬁc Computing , vol. 31, no. 2, pp. 939–960, 2009

work page 2009
[20]

J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model Predictive Control: Theory, Computation, and Design . Santa Barbara, CA, USA: Nob Hill Publishing, 2017, ch. 8

work page 2017
[21]

Algorithms and methods for high-performance model predictive control,

G. Frison, “Algorithms and methods for high-performance model predictive control,” 2016

work page 2016
[22]

HPIPM: a high-performance quadratic programming framework for model predictive control,

G. Frison and M. Diehl, “HPIPM: a high-performance quadratic programming framework for model predictive control,” IFAC- PapersOnLine, vol. 53, no. 2, pp. 6563–6569, 2020

work page 2020
[23]

Implementing an interior point method for linear programs on a CPU-GPU system,

J. H. Jung and D. P. O’Leary, “Implementing an interior point method for linear programs on a CPU-GPU system,” Electronic Transactions on Numerical Analysis , vol. 28, no. 174-189, p. 37, 2008

work page 2008
[24]

GPU acceleration of the matrix- free interior point method,

E. Smith, J. Gondzio, and J. Hall, “GPU acceleration of the matrix- free interior point method,” in International Conference on Parallel Processing and Applied Mathematics . Springer, 2011, pp. 681–689

work page 2011
[25]

An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,

Y . Cao, A. Seth, and C. D. Laird, “An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,” Computers & Chemical Engineering , vol. 85, pp. 76–83, 2016

work page 2016
[26]

On the efﬁciency of supernodal factorization in interior-point method using CPU-GPU collaboration,

U. A. Shah, S. Yousaf, I. Ahmad, and M. O. Ahmad, “On the efﬁciency of supernodal factorization in interior-point method using CPU-GPU collaboration,” IEEE Access, vol. 8, pp. 120 892–120 904, 2020

work page 2020
[27]

Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,

F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,” arXiv preprint arXiv:2203.11875 , 2022

work page arXiv 2022
[28]

Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,

K. ´Swirydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Peleš, “Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,” Parallel Computing, vol. 111, p. 102870, 2022

work page 2022
[29]

MPC toolbox with GPU accelerated optimization algorithms,

N. F. Gade-Nielsen, J. B. Jørgensen, and B. Dammann, “MPC toolbox with GPU accelerated optimization algorithms,” in 10th European workshop on advanced control and diagnosis . Technical University of Denmark, 2012

work page 2012
[30]

Interior point methods on GPU with application to model predictive control,

N. F. Gade-Nielsen, “Interior point methods on GPU with application to model predictive control,” 2014

work page 2014
[31]

OSQP: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020
[32]

GPU acceleration of ADMM for large-scale quadratic programming,

M. Schubiger, G. Banjac, and J. Lygeros, “GPU acceleration of ADMM for large-scale quadratic programming,” Journal of Parallel and Distributed Computing , vol. 144, pp. 55–67, 2020. [Online]. Available: https://doi.org/10.1016/j.jpdc.2020.05.021

work page doi:10.1016/j.jpdc.2020.05.021 2020
[33]

Available: https://github.com/MadNLP/DynamicNLPModels

[Online]. Available: https://github.com/MadNLP/DynamicNLPModels. jl

work page
[34]

NLPModels.jl: Data structures for optimization models,

D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020

work page 2020
[35]

On the implementation of an interior- point ﬁlter line-search algorithm for large-scale nonlinear programming,

A. Wächter and L. T. Biegler, “On the implementation of an interior- point ﬁlter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

work page 2006
[36]

[Online]. Available: https://github.com/dlcole3/3D_temperature_control Government License: The submitted manuscript has been cre- ated by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne"). Argonne, a U.S. Department of Energy Ofﬁce of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retain...

work page

[1] [1]

Partitioned dynamic programming for optimal control,

S. J. Wright, “Partitioned dynamic programming for optimal control,” SIAM Journal on Optimization , vol. 1, no. 4, pp. 620–642, 1991

work page 1991

[2] [2]

Application of interior- point methods to model predictive control,

C. V . Rao, S. J. Wright, and J. B. Rawlings, “Application of interior- point methods to model predictive control,” Journal of Optimization Theory and Applications , vol. 99, no. 3, pp. 723–757, 1998

work page 1998

[3] [3]

High- performance small-scale solvers for linear model predictive control,

G. Frison, H. H. B. Sørensen, B. Dammann, and J. B. Jørgensen, “High- performance small-scale solvers for linear model predictive control,” in 2014 European Control Conference (ECC) . IEEE, 2014, pp. 128–133

work page 2014

[4] [4]

GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,

S. Lee, H. Lee, Y . Kim, J. Kim, and W. Choi, “GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,” Sensors, vol. 22, no. 12, p. 4512, 2022

work page 2022

[5] [5]

A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,

J. A. Andersson, J. V . Frasch, M. Vukov, and M. Diehl, “A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,” Automatica, pp. 97–100, 2013

work page 2013

[6] [6]

A sparse and condensed QP formulation for predictive control of LTI systems,

J. L. Jerez, E. C. Kerrigan, and G. A. Constantinides, “A sparse and condensed QP formulation for predictive control of LTI systems,” Automatica, vol. 48, no. 5, pp. 999–1002, 2012

work page 2012

[7] [7]

On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,

S. Na, S. Shin, M. Anitescu, and V . M. Zavala, “On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,” IEEE Transactions on Automatic Control , 2022

work page 2022

[8] [8]

A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,

R. B. Schnabel, “A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,” Parallel Computing, vol. 21, no. 6, pp. 875–905, 1995

work page 1995

[9] [9]

Solving optimization problems on computational grids,

S. J. Wright, “Solving optimization problems on computational grids,” 2001

work page 2001

[10] [10]

Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,

N. Chiang, C. G. Petra, and V . M. Zavala, “Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,” in 2014 Power Systems Computation Conference . IEEE, 2014, pp. 1–7

work page 2014

[11] [11]

An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,

K. Kim, C. G. Petra, and V . M. Zavala, “An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,” SIAM Journal on Optimization , vol. 29, no. 1, pp. 318–342, 2019

work page 2019

[12] [12]

Comparing energy efﬁciency of CPU, GPU and FPGA implementations for vision kernels,

M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. H. Jones, “Comparing energy efﬁciency of CPU, GPU and FPGA implementations for vision kernels,” in 2019 IEEE International Conference on Embedded Software and Systems (ICESS) , 2019, pp. 1–8

work page 2019

[13] [13]

A step towards energy efﬁcient computing: Redesigning a hydrodynamic application on CPU-GPU,

T. Dong, V . Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “A step towards energy efﬁcient computing: Redesigning a hydrodynamic application on CPU-GPU,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium , 2014, pp. 972–981

work page 2014

[14] [14]

Efﬁcient calibration of embedded MPC,

M. Forgione, D. Piga, and A. Bemporad, “Efﬁcient calibration of embedded MPC,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5189–5194, 2020

work page 2020

[15] [15]

Model predictive control for autonomous navigation using embedded graphics processing unit,

D.-K. Phung, B. Hérissé, J. Marzat, and S. Bertrand, “Model predictive control for autonomous navigation using embedded graphics processing unit,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 11 883–11 888, 2017

work page 2017

[16] [16]

GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,

K. M. M. Rathai, M. Alamir, and O. Sename, “GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 14 369–14 374, 2020

work page 2020

[17] [17]

Efﬁcient convex optimization on GPUs for embedded model predictive control,

L. Yu, A. Goldsmith, and S. Di Cairano, “Efﬁcient convex optimization on GPUs for embedded model predictive control,” in Proceedings of the General Purpose GPUs . New York, NY , USA: Association for Computing Machinery, 2017, pp. 12–21. [Online]. Available: https://doi.org/10.1145/3038228.3038234

work page doi:10.1145/3038228.3038234 2017

[18] [18]

A collection of Fortran codes for large scale scientiﬁc computation

HSL, “A collection of Fortran codes for large scale scientiﬁc computation.” [Online]. Available: http://www.hsl.rl.ac.uk/

work page

[19] [19]

Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,

O. Schenk, A. Wächter, and M. Weiser, “Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,” SIAM Journal on Scientiﬁc Computing , vol. 31, no. 2, pp. 939–960, 2009

work page 2009

[20] [20]

J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model Predictive Control: Theory, Computation, and Design . Santa Barbara, CA, USA: Nob Hill Publishing, 2017, ch. 8

work page 2017

[21] [21]

Algorithms and methods for high-performance model predictive control,

G. Frison, “Algorithms and methods for high-performance model predictive control,” 2016

work page 2016

[22] [22]

HPIPM: a high-performance quadratic programming framework for model predictive control,

G. Frison and M. Diehl, “HPIPM: a high-performance quadratic programming framework for model predictive control,” IFAC- PapersOnLine, vol. 53, no. 2, pp. 6563–6569, 2020

work page 2020

[23] [23]

Implementing an interior point method for linear programs on a CPU-GPU system,

J. H. Jung and D. P. O’Leary, “Implementing an interior point method for linear programs on a CPU-GPU system,” Electronic Transactions on Numerical Analysis , vol. 28, no. 174-189, p. 37, 2008

work page 2008

[24] [24]

GPU acceleration of the matrix- free interior point method,

E. Smith, J. Gondzio, and J. Hall, “GPU acceleration of the matrix- free interior point method,” in International Conference on Parallel Processing and Applied Mathematics . Springer, 2011, pp. 681–689

work page 2011

[25] [25]

An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,

Y . Cao, A. Seth, and C. D. Laird, “An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,” Computers & Chemical Engineering , vol. 85, pp. 76–83, 2016

work page 2016

[26] [26]

On the efﬁciency of supernodal factorization in interior-point method using CPU-GPU collaboration,

U. A. Shah, S. Yousaf, I. Ahmad, and M. O. Ahmad, “On the efﬁciency of supernodal factorization in interior-point method using CPU-GPU collaboration,” IEEE Access, vol. 8, pp. 120 892–120 904, 2020

work page 2020

[27] [27]

Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,

F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,” arXiv preprint arXiv:2203.11875 , 2022

work page arXiv 2022

[28] [28]

Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,

K. ´Swirydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Peleš, “Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,” Parallel Computing, vol. 111, p. 102870, 2022

work page 2022

[29] [29]

MPC toolbox with GPU accelerated optimization algorithms,

N. F. Gade-Nielsen, J. B. Jørgensen, and B. Dammann, “MPC toolbox with GPU accelerated optimization algorithms,” in 10th European workshop on advanced control and diagnosis . Technical University of Denmark, 2012

work page 2012

[30] [30]

Interior point methods on GPU with application to model predictive control,

N. F. Gade-Nielsen, “Interior point methods on GPU with application to model predictive control,” 2014

work page 2014

[31] [31]

OSQP: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020

[32] [32]

GPU acceleration of ADMM for large-scale quadratic programming,

M. Schubiger, G. Banjac, and J. Lygeros, “GPU acceleration of ADMM for large-scale quadratic programming,” Journal of Parallel and Distributed Computing , vol. 144, pp. 55–67, 2020. [Online]. Available: https://doi.org/10.1016/j.jpdc.2020.05.021

work page doi:10.1016/j.jpdc.2020.05.021 2020

[33] [33]

Available: https://github.com/MadNLP/DynamicNLPModels

[Online]. Available: https://github.com/MadNLP/DynamicNLPModels. jl

work page

[34] [34]

NLPModels.jl: Data structures for optimization models,

D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020

work page 2020

[35] [35]

On the implementation of an interior- point ﬁlter line-search algorithm for large-scale nonlinear programming,

A. Wächter and L. T. Biegler, “On the implementation of an interior- point ﬁlter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

work page 2006

[36] [36]

[Online]. Available: https://github.com/dlcole3/3D_temperature_control Government License: The submitted manuscript has been cre- ated by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne"). Argonne, a U.S. Department of Energy Ofﬁce of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retain...

work page