Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems
Pith reviewed 2026-05-24 11:25 UTC · model grok-4.3
The pith
Condensing linear-quadratic MPC to a positive definite matrix sized only by controls enables order-of-magnitude faster GPU solves than CPU.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By eliminating the state variables and applying a condensed-space interior-point method, the resulting matrix after inequality removal is positive definite, depends in size only on the number of controls, and admits efficient parallel factorization on GPU/SIMD hardware, delivering an order of magnitude faster solution than a standard CPU implementation for PDE-constrained linear-quadratic MPC problems.
What carries the argument
The condensed positive definite matrix after state elimination and inequality removal in the interior-point method, whose dimension depends solely on the number of controls.
If this is right
- Computational effort scales with the number of controls rather than the number of states.
- Parallel factorization on GPU hardware becomes the dominant cost only through the control dimension and horizon length.
- The method applies directly to large-scale PDE-constrained optimal control problems with moderate horizons.
- Open-source Julia modeling and solver packages enable direct GPU deployment for such MPC instances.
Where Pith is reading between the lines
- The same state-elimination step could be combined with other parallel linear-algebra routines beyond interior-point methods.
- If positive definiteness holds under modest perturbations, the technique may extend to mildly nonlinear MPC formulations.
- Benchmarking the released Julia packages against other GPU-accelerated solvers would quantify the practical advantage for end users.
Load-bearing premise
The condensed matrix remains positive definite after state elimination and inequality removal, allowing stable parallel factorization.
What would settle it
A linear-quadratic MPC instance where the condensed matrix loses positive definiteness or a timing benchmark where the GPU implementation fails to achieve the reported order-of-magnitude speedup over CPU.
Figures
read the original abstract
We report numerical results on solving constrained linear-quadratic model predictive control (MPC) problems by exploiting graphics processing units (GPUs). The presented method reduces the MPC problem by eliminating the state variables and applies a condensed-space interior-point method to remove the inequality constraints in the KKT system. The final condensed matrix is positive definite and can be efficiently factorized in parallel on GPU/SIMD architectures. In addition, the size of the condensed matrix depends only on the number of controls in the problem, rendering the method particularly effective when the problem has many states but few inputs and moderate horizon length. Our numerical results for PDE-constrained problems show that the approach is an order of magnitude faster than a standard CPU implementation. We also provide an open-source Julia framework that facilitates modeling (DynamicNLPModels.jl) and solution (MadNLP.jl) of MPC problems on GPUs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes solving constrained linear-quadratic MPC problems by eliminating state variables to condense the problem, followed by a condensed-space interior-point method that removes inequalities from the KKT system. The resulting condensed matrix is asserted to be positive definite with size depending only on the number of controls, enabling efficient parallel factorization on GPU/SIMD hardware. Numerical experiments on PDE-constrained problems report an order-of-magnitude speedup versus a standard CPU implementation, and an open-source Julia framework (DynamicNLPModels.jl and MadNLP.jl) is provided.
Significance. If the positive-definiteness claim holds under the stated conditions and the performance results prove robust, the approach would offer a practical route to real-time solution of high-dimensional MPC problems (many states, few inputs, moderate horizons) on GPU hardware. The open-source modeling and solver framework strengthens reproducibility and adoption.
major comments (2)
- [Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.
- [Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of the positive-definiteness property and the numerical results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.
Authors: We agree that the abstract would benefit from additional context on this key property. The manuscript constructs the condensed Hessian via state elimination in Section 3 and describes the modified KKT system after inequality removal in Section 4. To make the positive-definiteness claim fully rigorous, we will add a concise proof (or explicit conditions, such as positive-definite stage costs) in a new subsection or appendix showing inheritance from the original LQ objective. This will directly support the stability of the parallel factorization. revision: yes
-
Referee: [Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.
Authors: We acknowledge that the numerical results section would be strengthened by more complete reporting. In the revision we will include error bars from repeated timings, explicit descriptions of the PDE test problems (including discretization parameters such as spatial grids and horizon lengths), and direct comparisons against tuned CPU baselines that employ optimized linear-algebra libraries. revision: yes
Circularity Check
No circularity; derivation is self-contained standard condensing + IPM with independent numerical validation
full rationale
The paper presents a standard MPC condensing approach (state elimination) followed by a condensed-space interior-point method, asserts positive definiteness of the resulting matrix (a property inherited from the LQ cost under typical assumptions), and reports empirical GPU speedups on PDE-constrained instances. No step reduces a claimed prediction or first-principles result to a fitted parameter, self-citation, or definitional tautology. The speedup is measured numerically rather than derived from the definiteness claim itself, and the method description does not rely on load-bearing self-citations or ansatzes smuggled from prior work by the same authors. This is the common case of an independent algorithmic description validated externally by timings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The condensed KKT matrix after state elimination and inequality removal is positive definite.
Reference graph
Works this paper leans on
-
[1]
Partitioned dynamic programming for optimal control,
S. J. Wright, “Partitioned dynamic programming for optimal control,” SIAM Journal on Optimization , vol. 1, no. 4, pp. 620–642, 1991
work page 1991
-
[2]
Application of interior- point methods to model predictive control,
C. V . Rao, S. J. Wright, and J. B. Rawlings, “Application of interior- point methods to model predictive control,” Journal of Optimization Theory and Applications , vol. 99, no. 3, pp. 723–757, 1998
work page 1998
-
[3]
High- performance small-scale solvers for linear model predictive control,
G. Frison, H. H. B. Sørensen, B. Dammann, and J. B. Jørgensen, “High- performance small-scale solvers for linear model predictive control,” in 2014 European Control Conference (ECC) . IEEE, 2014, pp. 128–133
work page 2014
-
[4]
S. Lee, H. Lee, Y . Kim, J. Kim, and W. Choi, “GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,” Sensors, vol. 22, no. 12, p. 4512, 2022
work page 2022
-
[5]
A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,
J. A. Andersson, J. V . Frasch, M. Vukov, and M. Diehl, “A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,” Automatica, pp. 97–100, 2013
work page 2013
-
[6]
A sparse and condensed QP formulation for predictive control of LTI systems,
J. L. Jerez, E. C. Kerrigan, and G. A. Constantinides, “A sparse and condensed QP formulation for predictive control of LTI systems,” Automatica, vol. 48, no. 5, pp. 999–1002, 2012
work page 2012
-
[7]
On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,
S. Na, S. Shin, M. Anitescu, and V . M. Zavala, “On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,” IEEE Transactions on Automatic Control , 2022
work page 2022
-
[8]
A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,
R. B. Schnabel, “A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,” Parallel Computing, vol. 21, no. 6, pp. 875–905, 1995
work page 1995
-
[9]
Solving optimization problems on computational grids,
S. J. Wright, “Solving optimization problems on computational grids,” 2001
work page 2001
-
[10]
Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,
N. Chiang, C. G. Petra, and V . M. Zavala, “Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,” in 2014 Power Systems Computation Conference . IEEE, 2014, pp. 1–7
work page 2014
-
[11]
K. Kim, C. G. Petra, and V . M. Zavala, “An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,” SIAM Journal on Optimization , vol. 29, no. 1, pp. 318–342, 2019
work page 2019
-
[12]
Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,
M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. H. Jones, “Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,” in 2019 IEEE International Conference on Embedded Software and Systems (ICESS) , 2019, pp. 1–8
work page 2019
-
[13]
A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU,
T. Dong, V . Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium , 2014, pp. 972–981
work page 2014
-
[14]
Efficient calibration of embedded MPC,
M. Forgione, D. Piga, and A. Bemporad, “Efficient calibration of embedded MPC,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5189–5194, 2020
work page 2020
-
[15]
Model predictive control for autonomous navigation using embedded graphics processing unit,
D.-K. Phung, B. Hérissé, J. Marzat, and S. Bertrand, “Model predictive control for autonomous navigation using embedded graphics processing unit,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 11 883–11 888, 2017
work page 2017
-
[16]
K. M. M. Rathai, M. Alamir, and O. Sename, “GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 14 369–14 374, 2020
work page 2020
-
[17]
Efficient convex optimization on GPUs for embedded model predictive control,
L. Yu, A. Goldsmith, and S. Di Cairano, “Efficient convex optimization on GPUs for embedded model predictive control,” in Proceedings of the General Purpose GPUs . New York, NY , USA: Association for Computing Machinery, 2017, pp. 12–21. [Online]. Available: https://doi.org/10.1145/3038228.3038234
-
[18]
A collection of Fortran codes for large scale scientific computation
HSL, “A collection of Fortran codes for large scale scientific computation.” [Online]. Available: http://www.hsl.rl.ac.uk/
-
[19]
Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,
O. Schenk, A. Wächter, and M. Weiser, “Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,” SIAM Journal on Scientific Computing , vol. 31, no. 2, pp. 939–960, 2009
work page 2009
-
[20]
J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model Predictive Control: Theory, Computation, and Design . Santa Barbara, CA, USA: Nob Hill Publishing, 2017, ch. 8
work page 2017
-
[21]
Algorithms and methods for high-performance model predictive control,
G. Frison, “Algorithms and methods for high-performance model predictive control,” 2016
work page 2016
-
[22]
HPIPM: a high-performance quadratic programming framework for model predictive control,
G. Frison and M. Diehl, “HPIPM: a high-performance quadratic programming framework for model predictive control,” IFAC- PapersOnLine, vol. 53, no. 2, pp. 6563–6569, 2020
work page 2020
-
[23]
Implementing an interior point method for linear programs on a CPU-GPU system,
J. H. Jung and D. P. O’Leary, “Implementing an interior point method for linear programs on a CPU-GPU system,” Electronic Transactions on Numerical Analysis , vol. 28, no. 174-189, p. 37, 2008
work page 2008
-
[24]
GPU acceleration of the matrix- free interior point method,
E. Smith, J. Gondzio, and J. Hall, “GPU acceleration of the matrix- free interior point method,” in International Conference on Parallel Processing and Applied Mathematics . Springer, 2011, pp. 681–689
work page 2011
-
[25]
Y . Cao, A. Seth, and C. D. Laird, “An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,” Computers & Chemical Engineering , vol. 85, pp. 76–83, 2016
work page 2016
-
[26]
On the efficiency of supernodal factorization in interior-point method using CPU-GPU collaboration,
U. A. Shah, S. Yousaf, I. Ahmad, and M. O. Ahmad, “On the efficiency of supernodal factorization in interior-point method using CPU-GPU collaboration,” IEEE Access, vol. 8, pp. 120 892–120 904, 2020
work page 2020
-
[27]
Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,
F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,” arXiv preprint arXiv:2203.11875 , 2022
-
[28]
Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,
K. ´Swirydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Peleš, “Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,” Parallel Computing, vol. 111, p. 102870, 2022
work page 2022
-
[29]
MPC toolbox with GPU accelerated optimization algorithms,
N. F. Gade-Nielsen, J. B. Jørgensen, and B. Dammann, “MPC toolbox with GPU accelerated optimization algorithms,” in 10th European workshop on advanced control and diagnosis . Technical University of Denmark, 2012
work page 2012
-
[30]
Interior point methods on GPU with application to model predictive control,
N. F. Gade-Nielsen, “Interior point methods on GPU with application to model predictive control,” 2014
work page 2014
-
[31]
OSQP: An operator splitting solver for quadratic programs,
B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020
work page 2020
-
[32]
GPU acceleration of ADMM for large-scale quadratic programming,
M. Schubiger, G. Banjac, and J. Lygeros, “GPU acceleration of ADMM for large-scale quadratic programming,” Journal of Parallel and Distributed Computing , vol. 144, pp. 55–67, 2020. [Online]. Available: https://doi.org/10.1016/j.jpdc.2020.05.021
-
[33]
Available: https://github.com/MadNLP/DynamicNLPModels
[Online]. Available: https://github.com/MadNLP/DynamicNLPModels. jl
-
[34]
NLPModels.jl: Data structures for optimization models,
D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020
work page 2020
-
[35]
A. Wächter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006
work page 2006
-
[36]
[Online]. Available: https://github.com/dlcole3/3D_temperature_control Government License: The submitted manuscript has been cre- ated by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retain...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.