pith. sign in

arxiv: 2209.13049 · v1 · submitted 2022-09-26 · 🧮 math.OC

Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems

Pith reviewed 2026-05-24 11:25 UTC · model grok-4.3

classification 🧮 math.OC
keywords model predictive controlGPU computinginterior-point methodscondensed formulationPDE-constrained optimizationparallel factorizationlinear-quadratic problems
0
0 comments X

The pith

Condensing linear-quadratic MPC to a positive definite matrix sized only by controls enables order-of-magnitude faster GPU solves than CPU.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that eliminating state variables from a constrained linear-quadratic MPC problem, followed by a condensed-space interior-point method to remove inequalities from the KKT system, produces a positive definite matrix whose size depends only on the number of controls. This structure permits stable parallel factorization on GPU and SIMD architectures, which is especially effective for problems with many states, few inputs, and moderate horizons such as PDE-constrained control. A reader would care because the approach demonstrates a concrete route to real-time MPC on widely available hardware without custom low-level coding. Numerical tests confirm an order of magnitude speedup over standard CPU implementations. The work also supplies open-source Julia packages for modeling and GPU solution of such problems.

Core claim

By eliminating the state variables and applying a condensed-space interior-point method, the resulting matrix after inequality removal is positive definite, depends in size only on the number of controls, and admits efficient parallel factorization on GPU/SIMD hardware, delivering an order of magnitude faster solution than a standard CPU implementation for PDE-constrained linear-quadratic MPC problems.

What carries the argument

The condensed positive definite matrix after state elimination and inequality removal in the interior-point method, whose dimension depends solely on the number of controls.

If this is right

  • Computational effort scales with the number of controls rather than the number of states.
  • Parallel factorization on GPU hardware becomes the dominant cost only through the control dimension and horizon length.
  • The method applies directly to large-scale PDE-constrained optimal control problems with moderate horizons.
  • Open-source Julia modeling and solver packages enable direct GPU deployment for such MPC instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-elimination step could be combined with other parallel linear-algebra routines beyond interior-point methods.
  • If positive definiteness holds under modest perturbations, the technique may extend to mildly nonlinear MPC formulations.
  • Benchmarking the released Julia packages against other GPU-accelerated solvers would quantify the practical advantage for end users.

Load-bearing premise

The condensed matrix remains positive definite after state elimination and inequality removal, allowing stable parallel factorization.

What would settle it

A linear-quadratic MPC instance where the condensed matrix loses positive definiteness or a timing benchmark where the GPU implementation fails to achieve the reported order-of-magnitude speedup over CPU.

Figures

Figures reproduced from arXiv: 2209.13049 by David Cole, Fran\c{c}ois Pacaud, Mihai Anitescu, Sungho Shin, Victor M. Zavala.

Figure 1
Figure 1. Figure 1: Full, reduced, and condensed system for linear-quadratic MPC [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Total solver time for T = 250. TABLE II SOLVER STATISTICS FOR T = 250 AND VARIED N. CPU Sparse CPU Dense GPU Dense N iter tot lin iter tot lin iter tot lin 4 26 3.0 2.6 28 31.0 1.10 28 1.68 0.037 5 30 27.9 24.8 33 60.1 1.30 33 3.54 0.045 6 29 91.2 81.4 31 102.0 1.25 31 6.59 0.041 7 32 412.3 372.0 38 192.3 1.55 38 11.6 0.055 8 31 7086. 6617. 37 685.4 1.66 37 15.4 0.051 9 - - - 38 476.8 1.77 38 22.3 0.056 10… view at source ↗
Figure 3
Figure 3. Figure 3: Total solver time for N = 4 and varied T. TABLE V SOLVER STATISTICS FOR N = 4 AND VARIED T . CPU Sparse CPU Dense GPU Dense T iter tot lin iter tot lin iter tot lin 10 26 0.10 0.079 28 0.02 0.001 28 0.03 0.004 20 25 0.21 0.170 27 0.05 0.002 27 0.04 0.005 30 24 0.30 0.249 26 0.10 0.005 26 0.05 0.005 40 25 0.43 0.356 27 0.21 0.009 27 0.06 0.006 50 26 0.55 0.460 27 0.34 0.016 27 0.08 0.007 75 27 0.85 0.724 29… view at source ↗
read the original abstract

We report numerical results on solving constrained linear-quadratic model predictive control (MPC) problems by exploiting graphics processing units (GPUs). The presented method reduces the MPC problem by eliminating the state variables and applies a condensed-space interior-point method to remove the inequality constraints in the KKT system. The final condensed matrix is positive definite and can be efficiently factorized in parallel on GPU/SIMD architectures. In addition, the size of the condensed matrix depends only on the number of controls in the problem, rendering the method particularly effective when the problem has many states but few inputs and moderate horizon length. Our numerical results for PDE-constrained problems show that the approach is an order of magnitude faster than a standard CPU implementation. We also provide an open-source Julia framework that facilitates modeling (DynamicNLPModels.jl) and solution (MadNLP.jl) of MPC problems on GPUs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes solving constrained linear-quadratic MPC problems by eliminating state variables to condense the problem, followed by a condensed-space interior-point method that removes inequalities from the KKT system. The resulting condensed matrix is asserted to be positive definite with size depending only on the number of controls, enabling efficient parallel factorization on GPU/SIMD hardware. Numerical experiments on PDE-constrained problems report an order-of-magnitude speedup versus a standard CPU implementation, and an open-source Julia framework (DynamicNLPModels.jl and MadNLP.jl) is provided.

Significance. If the positive-definiteness claim holds under the stated conditions and the performance results prove robust, the approach would offer a practical route to real-time solution of high-dimensional MPC problems (many states, few inputs, moderate horizons) on GPU hardware. The open-source modeling and solver framework strengthens reproducibility and adoption.

major comments (2)
  1. [Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.
  2. [Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of the positive-definiteness property and the numerical results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'the final condensed matrix is positive definite' is asserted without an explicit construction of the condensed Hessian (via state elimination) or the modified KKT matrix (after inequality removal), and without conditions or a proof that positive definiteness is inherited from the original LQ cost. This property is load-bearing for the stability of the claimed parallel factorization whose cost scales only with the number of controls.

    Authors: We agree that the abstract would benefit from additional context on this key property. The manuscript constructs the condensed Hessian via state elimination in Section 3 and describes the modified KKT system after inequality removal in Section 4. To make the positive-definiteness claim fully rigorous, we will add a concise proof (or explicit conditions, such as positive-definite stage costs) in a new subsection or appendix showing inheritance from the original LQ objective. This will directly support the stability of the parallel factorization. revision: yes

  2. Referee: [Abstract] Abstract (numerical results paragraph): the reported order-of-magnitude speedups on PDE-constrained problems supply no error bars, no description of test-problem selection or discretization parameters, and no comparison against tuned CPU baselines, leaving the central performance claim only weakly supported.

    Authors: We acknowledge that the numerical results section would be strengthened by more complete reporting. In the revision we will include error bars from repeated timings, explicit descriptions of the PDE test problems (including discretization parameters such as spatial grids and horizon lengths), and direct comparisons against tuned CPU baselines that employ optimized linear-algebra libraries. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained standard condensing + IPM with independent numerical validation

full rationale

The paper presents a standard MPC condensing approach (state elimination) followed by a condensed-space interior-point method, asserts positive definiteness of the resulting matrix (a property inherited from the LQ cost under typical assumptions), and reports empirical GPU speedups on PDE-constrained instances. No step reduces a claimed prediction or first-principles result to a fitted parameter, self-citation, or definitional tautology. The speedup is measured numerically rather than derived from the definiteness claim itself, and the method description does not rely on load-bearing self-citations or ansatzes smuggled from prior work by the same authors. This is the common case of an independent algorithmic description validated externally by timings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard linear-algebra facts (positive-definiteness after condensation) and the structural assumption that the number of inputs is small relative to states; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The condensed KKT matrix after state elimination and inequality removal is positive definite.
    Invoked to guarantee stable Cholesky factorization on GPU (abstract method paragraph).

pith-pipeline@v0.9.0 · 5695 in / 1260 out tokens · 30293 ms · 2026-05-24T11:25:23.354163+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Partitioned dynamic programming for optimal control,

    S. J. Wright, “Partitioned dynamic programming for optimal control,” SIAM Journal on Optimization , vol. 1, no. 4, pp. 620–642, 1991

  2. [2]

    Application of interior- point methods to model predictive control,

    C. V . Rao, S. J. Wright, and J. B. Rawlings, “Application of interior- point methods to model predictive control,” Journal of Optimization Theory and Applications , vol. 99, no. 3, pp. 723–757, 1998

  3. [3]

    High- performance small-scale solvers for linear model predictive control,

    G. Frison, H. H. B. Sørensen, B. Dammann, and J. B. Jørgensen, “High- performance small-scale solvers for linear model predictive control,” in 2014 European Control Conference (ECC) . IEEE, 2014, pp. 128–133

  4. [4]

    GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,

    S. Lee, H. Lee, Y . Kim, J. Kim, and W. Choi, “GPU-accelerated PD-IPM for real-time model predictive control in integrated missile guidance and control systems,” Sensors, vol. 22, no. 12, p. 4512, 2022

  5. [5]

    A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,

    J. A. Andersson, J. V . Frasch, M. Vukov, and M. Diehl, “A condensing algorithm for nonlinear MPC with a quadratic runtime in horizon length,” Automatica, pp. 97–100, 2013

  6. [6]

    A sparse and condensed QP formulation for predictive control of LTI systems,

    J. L. Jerez, E. C. Kerrigan, and G. A. Constantinides, “A sparse and condensed QP formulation for predictive control of LTI systems,” Automatica, vol. 48, no. 5, pp. 999–1002, 2012

  7. [7]

    On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,

    S. Na, S. Shin, M. Anitescu, and V . M. Zavala, “On the convergence of overlapping Schwarz decomposition for nonlinear optimal control,” IEEE Transactions on Automatic Control , 2022

  8. [8]

    A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,

    R. B. Schnabel, “A view of the limitations, opportunities, and challenges in parallel nonlinear optimization,” Parallel Computing, vol. 21, no. 6, pp. 875–905, 1995

  9. [9]

    Solving optimization problems on computational grids,

    S. J. Wright, “Solving optimization problems on computational grids,” 2001

  10. [10]

    Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,

    N. Chiang, C. G. Petra, and V . M. Zavala, “Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,” in 2014 Power Systems Computation Conference . IEEE, 2014, pp. 1–7

  11. [11]

    An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,

    K. Kim, C. G. Petra, and V . M. Zavala, “An asynchronous bundle- trust-region method for dual decomposition of stochastic mixed-integer programming,” SIAM Journal on Optimization , vol. 29, no. 1, pp. 318–342, 2019

  12. [12]

    Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,

    M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. H. Jones, “Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,” in 2019 IEEE International Conference on Embedded Software and Systems (ICESS) , 2019, pp. 1–8

  13. [13]

    A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU,

    T. Dong, V . Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium , 2014, pp. 972–981

  14. [14]

    Efficient calibration of embedded MPC,

    M. Forgione, D. Piga, and A. Bemporad, “Efficient calibration of embedded MPC,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5189–5194, 2020

  15. [15]

    Model predictive control for autonomous navigation using embedded graphics processing unit,

    D.-K. Phung, B. Hérissé, J. Marzat, and S. Bertrand, “Model predictive control for autonomous navigation using embedded graphics processing unit,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 11 883–11 888, 2017

  16. [16]

    GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,

    K. M. M. Rathai, M. Alamir, and O. Sename, “GPU based stochastic parameterized NMPC scheme for control of semi-active suspension system for half car vehicle,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 14 369–14 374, 2020

  17. [17]

    Efficient convex optimization on GPUs for embedded model predictive control,

    L. Yu, A. Goldsmith, and S. Di Cairano, “Efficient convex optimization on GPUs for embedded model predictive control,” in Proceedings of the General Purpose GPUs . New York, NY , USA: Association for Computing Machinery, 2017, pp. 12–21. [Online]. Available: https://doi.org/10.1145/3038228.3038234

  18. [18]

    A collection of Fortran codes for large scale scientific computation

    HSL, “A collection of Fortran codes for large scale scientific computation.” [Online]. Available: http://www.hsl.rl.ac.uk/

  19. [19]

    Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,

    O. Schenk, A. Wächter, and M. Weiser, “Inertia-revealing precondition- ing for large-scale nonconvex constrained optimization,” SIAM Journal on Scientific Computing , vol. 31, no. 2, pp. 939–960, 2009

  20. [20]

    J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model Predictive Control: Theory, Computation, and Design . Santa Barbara, CA, USA: Nob Hill Publishing, 2017, ch. 8

  21. [21]

    Algorithms and methods for high-performance model predictive control,

    G. Frison, “Algorithms and methods for high-performance model predictive control,” 2016

  22. [22]

    HPIPM: a high-performance quadratic programming framework for model predictive control,

    G. Frison and M. Diehl, “HPIPM: a high-performance quadratic programming framework for model predictive control,” IFAC- PapersOnLine, vol. 53, no. 2, pp. 6563–6569, 2020

  23. [23]

    Implementing an interior point method for linear programs on a CPU-GPU system,

    J. H. Jung and D. P. O’Leary, “Implementing an interior point method for linear programs on a CPU-GPU system,” Electronic Transactions on Numerical Analysis , vol. 28, no. 174-189, p. 37, 2008

  24. [24]

    GPU acceleration of the matrix- free interior point method,

    E. Smith, J. Gondzio, and J. Hall, “GPU acceleration of the matrix- free interior point method,” in International Conference on Parallel Processing and Applied Mathematics . Springer, 2011, pp. 681–689

  25. [25]

    An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,

    Y . Cao, A. Seth, and C. D. Laird, “An augmented lagrangian interior- point approach for large-scale NLP problems on graphics processing units,” Computers & Chemical Engineering , vol. 85, pp. 76–83, 2016

  26. [26]

    On the efficiency of supernodal factorization in interior-point method using CPU-GPU collaboration,

    U. A. Shah, S. Yousaf, I. Ahmad, and M. O. Ahmad, “On the efficiency of supernodal factorization in interior-point method using CPU-GPU collaboration,” IEEE Access, vol. 8, pp. 120 892–120 904, 2020

  27. [27]

    Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,

    F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,” arXiv preprint arXiv:2203.11875 , 2022

  28. [28]

    Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,

    K. ´Swirydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Peleš, “Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,” Parallel Computing, vol. 111, p. 102870, 2022

  29. [29]

    MPC toolbox with GPU accelerated optimization algorithms,

    N. F. Gade-Nielsen, J. B. Jørgensen, and B. Dammann, “MPC toolbox with GPU accelerated optimization algorithms,” in 10th European workshop on advanced control and diagnosis . Technical University of Denmark, 2012

  30. [30]

    Interior point methods on GPU with application to model predictive control,

    N. F. Gade-Nielsen, “Interior point methods on GPU with application to model predictive control,” 2014

  31. [31]

    OSQP: An operator splitting solver for quadratic programs,

    B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

  32. [32]

    GPU acceleration of ADMM for large-scale quadratic programming,

    M. Schubiger, G. Banjac, and J. Lygeros, “GPU acceleration of ADMM for large-scale quadratic programming,” Journal of Parallel and Distributed Computing , vol. 144, pp. 55–67, 2020. [Online]. Available: https://doi.org/10.1016/j.jpdc.2020.05.021

  33. [33]

    Available: https://github.com/MadNLP/DynamicNLPModels

    [Online]. Available: https://github.com/MadNLP/DynamicNLPModels. jl

  34. [34]

    NLPModels.jl: Data structures for optimization models,

    D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020

  35. [35]

    On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,

    A. Wächter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

  36. [36]

    [Online]. Available: https://github.com/dlcole3/3D_temperature_control Government License: The submitted manuscript has been cre- ated by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retain...