GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

arxiv: 2510.07625 · v2 · submitted 2025-10-08 · 💻 cs.RO · cs.SY· eess.SY

GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

Alexander Du , Emre Adabag , Gabriel Bravo-Palacios , Brian Plancher This is my paper

Pith reviewed 2026-05-18 08:29 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords GPU accelerationtrajectory optimizationmodel predictive controlbatched optimizationroboticsreal-time controledge computingnonlinear optimization

0 comments p. Extension

The pith

GATO delivers real-time batched nonlinear trajectory optimization on GPU for moderate batch sizes in model predictive control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GATO as a GPU-accelerated solver for batches of nonlinear trajectory optimization problems that arise in model predictive control. It targets the regime of tens to low hundreds of simultaneous solves, where existing CPU methods are too slow and prior GPU methods either sacrifice speed or model generality. The core approach co-designs the algorithm, software stack, and hardware mapping to combine block-, warp-, and thread-level parallelism both within and across solves. This matters for robotics because many state-of-the-art MPC applications need multiple real-time solves for disturbance rejection and replanning on edge hardware. The authors demonstrate the result through scaling benchmarks, improved control behavior in case studies, and direct hardware validation on an industrial manipulator.

Core claim

GATO is an open-source GPU-accelerated batched trajectory optimization solver that combines block-, warp-, and thread-level parallelism to achieve real-time throughput for moderate batch sizes of nonlinear solves. It reports speedups of 18-21x over CPU baselines and 1.4-16x over other GPU baselines as batch size grows, together with better disturbance rejection and convergence, and is validated on physical hardware.

What carries the argument

Multi-level (block, warp, thread) parallelism applied within and across solves in a batched nonlinear trajectory optimization framework.

If this is right

Real-time model predictive control becomes practical on edge hardware for tasks that require simultaneous optimization of tens to low hundreds of trajectories.
Solver throughput improves with larger batch sizes, enabling better scalability in applications that benefit from multiple parallel plans.
Improved disturbance rejection and convergence rates are observed in simulated and hardware case studies.
The open-source release allows direct reproduction and integration into existing robotics control stacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-level parallelism strategy could be adapted to other batch optimization problems in robotics such as motion planning or parameter estimation.
Faster per-batch solve times may allow MPC to operate at higher replanning frequencies or with more complex dynamics models on the same hardware.
Energy use on embedded platforms could decrease because shorter computation windows leave more time in low-power states.

Load-bearing premise

That combining block-, warp-, and thread-level parallelism on the GPU produces no prohibitive synchronization costs and preserves generality for nonlinear problems at moderate batch sizes.

What would settle it

A benchmark run at batch sizes of 50-200 where GATO either falls below real-time rates, shows no speedup over a tuned CPU solver, or loses solution accuracy for the same nonlinear models.

Figures

Figures reproduced from arXiv: 2510.07625 by Alexander Du, Brian Plancher, Emre Adabag, Gabriel Bravo-Palacios.

**Figure 2.** Figure 2: Overall design of our batched solver which a) forms problems in parallel across solves and timesteps, b) leverages [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: (Left) Solve times for 6-DoF manipulator motions while varying the batch size ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Average (normalized) merit function value across [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Figure-8 tracking task, with an external disturbance applied at the end effector. (Left) Bar chart shows tracking error, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Simulation visualization at the last timestep of the [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Cumulative density function of the solve times for [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches either parallelize single solves, handle large batches at sub-real-time rates, or sacrifice model generality for speed. This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GATO fills a useful gap with a GPU batched TO solver for moderate batch sizes in edge MPC, backed by benchmarks and hardware tests, though the multi-level parallelism overhead for nonlinear cases remains a point to verify.

read the letter

Look, the main thing to know is that GATO is a new open-source GPU batched solver for nonlinear trajectory optimization that targets real-time performance for batch sizes in the tens to low hundreds, which is where a lot of practical MPC applications sit. What they do well is lay out the gap clearly and then deliver a co-design that mixes parallelism at different levels to get those speedups. The simulated benchmarks show solid gains over CPU and other GPU approaches, and they include hardware tests on a manipulator plus some case studies on how it handles disturbances. Opening the code helps too. The potential weak point is the synchronization overhead from stacking block, warp, and thread parallelism. In nonlinear TO, things like linearizing dynamics or doing line searches aren't the same for every solve, so you risk divergence or extra waits. Moderate batches make it harder to keep everything busy without costs adding up, and the abstract doesn't spell out exactly how they sidestep that for general models. If that holds, the throughput numbers are good; if not, the advantage shrinks. This is aimed at people doing edge robotics control who need to run multiple optimizations quickly without big hardware. Anyone implementing MPC on limited compute would find the results and the code useful. I'd put it through peer review. The work is grounded enough in experiments and open enough to check that it merits referee attention.

Referee Report

1 major / 2 minor

Summary. The manuscript presents GATO, an open-source GPU-accelerated batched trajectory optimization solver for model predictive control targeting moderate batch sizes of tens to low hundreds of solves. It claims to fill a performance gap by co-designing algorithm, software, and hardware with combined block-, warp-, and thread-level parallelism, reporting empirical speedups of 18-21x over CPU baselines and 1.4-16x over other GPU baselines, along with case studies on disturbance rejection and hardware validation on an industrial manipulator.

Significance. If the throughput claims hold with adequate analysis of overheads, the work would meaningfully advance real-time nonlinear MPC on edge hardware by supporting moderate batch regimes without sacrificing model generality. The open-sourcing for reproducibility and the inclusion of hardware experiments are clear strengths that enhance the practical value of the contribution.

major comments (1)

Abstract: the central performance claims (18-21x CPU and 1.4-16x GPU speedups for moderate batch sizes) rest on the assumption that block-, warp-, and thread-level parallelism can be combined without prohibitive synchronization overhead or loss of generality for nonlinear TO. The abstract does not detail how data-dependent operations such as dynamics linearization or line search are scheduled to avoid warp divergence and cross-warp barriers in this batch-size regime, which is load-bearing for the real-time throughput assertion.

minor comments (2)

The description of the GPU baselines would benefit from explicit statement of their batch-size scaling behavior and whether they also target moderate regimes, to strengthen the comparative claims.
Consider adding a short table or paragraph summarizing the specific robot models, horizon lengths, and constraint types used in the simulated benchmarks to improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the practical value of GATO, including the open-source release and hardware validation. We address the single major comment below and outline a targeted revision to the manuscript.

read point-by-point responses

Referee: Abstract: the central performance claims (18-21x CPU and 1.4-16x GPU speedups for moderate batch sizes) rest on the assumption that block-, warp-, and thread-level parallelism can be combined without prohibitive synchronization overhead or loss of generality for nonlinear TO. The abstract does not detail how data-dependent operations such as dynamics linearization or line search are scheduled to avoid warp divergence and cross-warp barriers in this batch-size regime, which is load-bearing for the real-time throughput assertion.

Authors: We appreciate the referee highlighting the need for greater clarity on this point. The abstract is intentionally high-level to summarize the contribution and results. The detailed co-design of block-, warp-, and thread-level parallelism, along with the scheduling of data-dependent operations (dynamics linearization, line search, etc.) to control warp divergence and synchronization costs, is described in Sections 3 and 4 of the manuscript. Our implementation uses uniform batch processing, warp-level primitives for reductions, and kernel structures that minimize cross-warp barriers while preserving full nonlinear model generality. The reported speedups are measured end-to-end and already incorporate all overheads, as shown in the scaling benchmarks of Section 5. To make the abstract more self-contained and directly address the referee's concern, we will revise it to include a concise statement on the scheduling approach for data-dependent operations. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical performance claims

full rationale

The paper presents GATO as an implemented co-designed GPU solver for batched nonlinear trajectory optimization and supports its real-time throughput claims exclusively through direct empirical benchmarks (speedups of 18-21x over CPU and 1.4-16x over other GPU baselines) plus hardware validation on an industrial manipulator. These are measured outcomes from simulated and physical tests rather than any derived predictions, first-principles results, or fitted parameters that reduce to the inputs by construction. No equations, uniqueness theorems, ansatzes, or self-citations are invoked as load-bearing steps in the provided claims; the central argument rests on the observed behavior of the block/warp/thread parallelism implementation itself, which is externally falsifiable via the open-sourced code and independent re-runs. The derivation chain is therefore self-contained in the engineering and benchmarking methodology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from nonlinear optimization and parallel computing literature; no new free parameters or invented entities are introduced beyond the software artifact itself.

axioms (1)

domain assumption Nonlinear trajectory optimization problems in MPC can be solved reliably with standard numerical methods when sufficient compute is available.
Invoked implicitly when claiming real-time performance for general models.

pith-pipeline@v0.9.0 · 5752 in / 1112 out tokens · 40102 ms · 2026-05-18T08:29:39.446115+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance... batched PCG... symmetric stair preconditioner
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GATO... real-time throughput for moderate batch sizes of tens to low-hundreds of solves

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vectorizing Projection in Manifold-Constrained Motion Planning for Real-Time Whole-Body Control
cs.RO 2026-04 conditional novelty 6.0

Vectorizing projection operations enables real-time manifold-constrained motion planning for humanoid robots with 100-1000x speedups over prior methods.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Reactive planar manipula- tion with convex hybrid mpc,

F. R. Hogan, E. R. Grau, and A. Rodriguez, “Reactive planar manipula- tion with convex hybrid mpc,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 247–253

work page 2018
[2]

A unified mpc framework for whole-body dynamic locomotion and manipula- tion,

J.-P. Sleiman, F. Farshidian, M. V . Minniti, and M. Hutter, “A unified mpc framework for whole-body dynamic locomotion and manipula- tion,”IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4688– 4695, 2021

work page 2021
[3]

Cerberus in the darpa subterranean challenge,

M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart,et al., “Cerberus in the darpa subterranean challenge,”Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

work page 2022
[4]

Optimization-based control for dynamic legged robots,

P. M. Wensing, M. Posa, Y . Hu, A. Escande, N. Mansard, and A. Del Prete, “Optimization-based control for dynamic legged robots,” IEEE Transactions on Robotics, 2023

work page 2023
[5]

Taskable agility: Making useful dynamic behavior easier to create,

S. Kuindersma, “Taskable agility: Making useful dynamic behavior easier to create,” Princeton Robotics Seminar, April 2023

work page 2023
[6]

J. T. Betts,Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010

work page 2010
[7]

Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,

S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Per- menter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,”Autonomous robots, vol. 40, pp. 429–455, 2016

work page 2016
[8]

Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control,

H. Li and P. M. Wensing, “Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control,” arXiv preprint arXiv:2403.03995, 2024

work page arXiv 2024
[9]

Tinympc: Model-predictive control on resource-constrained micro- controllers,

K. Nguyen, S. Schoedel, A. Alavilli, B. Plancher, and Z. Manchester, “Tinympc: Model-predictive control on resource-constrained micro- controllers,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 1–7

work page 2024
[10]

Model predictive path integral control: From theory to parallel computation,

G. Williams, A. Aldrich, and E. A. Theodorou, “Model predictive path integral control: From theory to parallel computation,”Journal of Guidance, Control, and Dynamics, vol. 40, no. 2, pp. 344–357, 2017

work page 2017
[11]

Mppi- generic: A cuda library for stochastic trajectory optimization,

B. Vlahov, J. Gibson, M. Gandhi, and E. A. Theodorou, “Mppi- generic: A cuda library for stochastic trajectory optimization,”arXiv preprint arXiv:2409.07563, 2024

work page arXiv 2024
[12]

Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing,

H. Xue, C. Pan, Z. Yi, G. Qu, and G. Shi, “Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing,” arXiv preprint arXiv:2409.15610, 2024

work page arXiv 2024
[13]

Real-time whole-body control of legged robots with model- predictive path integral control,

J. Alvarez-Padilla, J. Z. Zhang, S. Kwok, J. M. Dolan, and Z. Manch- ester, “Real-time whole-body control of legged robots with model- predictive path integral control,”arXiv preprint arXiv:2409.10469, 2024

work page arXiv 2024
[14]

Comparison of nmpc and gpu- parallelized mppi for real-time uav control on embedded hardware,

R. Enrico, M. Mancini, and E. Capello, “Comparison of nmpc and gpu- parallelized mppi for real-time uav control on embedded hardware,” Applied Sciences, vol. 15, no. 16, p. 9114, 2025

work page 2025
[15]

A performance analysis of parallel differential dynamic programming on a gpu,

B. Plancher and S. Kuindersma, “A performance analysis of parallel differential dynamic programming on a gpu,” inProceedings of the 13th Workshop on the Algorithmic F oundations of Robotics. Springer, 2018, pp. 656–672

work page 2018
[16]

Gpu-based contact-aware trajectory optimization using a smooth force model,

Z. Pan, B. Ren, and D. Manocha, “Gpu-based contact-aware trajectory optimization using a smooth force model,” inProceedings of the 18th annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2019, pp. 1–12

work page 2019
[17]

Gpu-parallelized iterative lqr with input constraints for fast collision avoidance of autonomous vehicles,

Y . Lee, M. Cho, and K.-S. Kim, “Gpu-parallelized iterative lqr with input constraints for fast collision avoidance of autonomous vehicles,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 4797–4804

work page 2022
[18]

Exploit- ing gpu/simd architectures for solving linear-quadratic mpc problems,

D. Cole, S. Shin, F. Pacaud, V . M. Zavala, and M. Anitescu, “Exploit- ing gpu/simd architectures for solving linear-quadratic mpc problems,” in2023 American Control Conference (ACC). IEEE, 2023, pp. 3995– 4000

work page 2023
[19]

Accelerating Optimal Power Flow with GPUs: SIMD Abstraction of Nonlinear Programs and Condensed-Space Interior-Point Methods

S. Shin, F. Pacaud, and M. Anitescu, “Accelerating optimal power flow with gpus: Simd abstraction of nonlinear programs and condensed- space interior-point methods,”arXiv preprint arXiv:2307.16830, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Curobo: Parallelized collision-free robot motion generation,

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos,et al., “Curobo: Parallelized collision-free robot motion generation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8112–8119

work page 2023
[21]

Mpcgpu: Real-time nonlinear model predictive control through preconditioned conjugate gradient on the gpu,

E. Adabag, M. Atal, W. Gerard, and B. Plancher, “Mpcgpu: Real-time nonlinear model predictive control through preconditioned conjugate gradient on the gpu,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 9787–9794

work page 2024
[22]

Gpu-enabled parallel trajectory optimization framework for safe motion planning of autonomous vehicles,

Y . Lee, K. H. Choi, and K.-S. Kim, “Gpu-enabled parallel trajectory optimization framework for safe motion planning of autonomous vehicles,”IEEE Robotics and Automation Letters, 2024

work page 2024
[23]

Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,

S. H. Jeon, S. Hong, H. J. Lee, C. Khazoom, and S. Kim, “Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,”IEEE Robotics and Automation Letters, 2024

work page 2024
[24]

Relu-qp: A gpu-accelerated quadratic programming solver for model-predictive control,

A. L. Bishop, J. Z. Zhang, S. Gurumurthy, K. Tracy, and Z. Manch- ester, “Relu-qp: A gpu-accelerated quadratic programming solver for model-predictive control,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 13 285–13 292

work page 2024
[25]

On the differentiability of the primal- dual interior-point method,

K. Tracy and Z. Manchester, “On the differentiability of the primal- dual interior-point method,”arXiv preprint arXiv:2406.11749, 2024

work page arXiv 2024
[26]

Primal-dual ilqr for gpu-accelerated learning and control in legged robots,

L. Amatucci, J. Sousa-Pinto, G. Turrisi, D. Orban, V . Barasuol, and C. Semini, “Primal-dual ilqr for gpu-accelerated learning and control in legged robots,”arXiv preprint arXiv:2506.07823, 2025

work page arXiv 2025
[27]

Incomplete-lu and cholesky preconditioned iterative methods using cusparse and cublas,

M. Naumov, “Incomplete-lu and cholesky preconditioned iterative methods using cusparse and cublas,”Nvidia white paper, vol. 3, 2011

work page 2011
[28]

Gpu acceleration of admm for large-scale quadratic programming,

M. Schubiger, G. Banjac, and J. Lygeros, “Gpu acceleration of admm for large-scale quadratic programming,”Journal of Parallel and Distributed Computing, vol. 144, pp. 55–67, 2020

work page 2020
[29]

Accelerating robot dynamics gradients on a cpu, gpu, and fpga,

B. Plancher, S. M. Neuman, T. Bourgeat, S. Kuindersma, S. Devadas, and V . J. Reddi, “Accelerating robot dynamics gradients on a cpu, gpu, and fpga,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2335–2342, 2021

work page 2021
[30]

Grid: Gpu-accelerated rigid body dynamics with analytical gradients,

B. Plancher, S. M. Neuman, R. Ghosal, S. Kuindersma, and V . J. Reddi, “Grid: Gpu-accelerated rigid body dynamics with analytical gradients,” in2022 International Conference on Robotics and Automa- tion (ICRA). IEEE, 2022, pp. 6253–6260

work page 2022
[31]

Accelerating condensed interior-point methods on simd/gpu architec- tures,

F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on simd/gpu architec- tures,”Journal of Optimization Theory and Applications, pp. 1–20, 2023

work page 2023
[32]

Fast generation of collision- free trajectories for robot swarms using gpu acceleration,

M. Hamer, L. Widmer, and R. D’andrea, “Fast generation of collision- free trajectories for robot swarms using gpu acceleration,”IEEE Access, vol. 7, pp. 6679–6690, 2018

work page 2018
[33]

Fast joint multi-robot trajectory optimization by gpu accelerated batch solution of distributed sub-problems,

D. Guhathakurta, F. Rastgar, M. A. Sharma, K. M. Krishna, and A. K. Singh, “Fast joint multi-robot trajectory optimization by gpu accelerated batch solution of distributed sub-problems,”Frontiers in Robotics and AI, vol. 9, p. 890385, 2022

work page 2022
[34]

Gpu accelerated batch trajectory optimization for autonomous navi- gation,

F. Rastgar, H. Masnavi, K. Kruusam ¨ae, A. Aabloo, and A. K. Singh, “Gpu accelerated batch trajectory optimization for autonomous navi- gation,” in2023 American Control Conference (ACC). IEEE, 2023, pp. 718–725

work page 2023
[35]

Gait optimization for legged systems through mixed distribution cross-entropy optimization,

I. Tsikelis and K. Chatzilygeroudis, “Gait optimization for legged systems through mixed distribution cross-entropy optimization,” in 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids). IEEE, 2024, pp. 1011–1018

work page 2024
[36]

Risk-averse model predictive control for racing in adverse conditions,

T. Lew, M. Greiff, F. Djeumou, M. Suminaka, M. Thompson, and J. Subosits, “Risk-averse model predictive control for racing in adverse conditions,”arXiv preprint arXiv:2410.17183, 2024

work page arXiv 2024
[37]

Nocedal and S

J. Nocedal and S. J. Wright,Numerical optimization. Springer, 1999

work page 1999
[38]

On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,

A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,”Mathematical programming, vol. 106, pp. 25–57, 2006

work page 2006
[39]

Snopt: An sqp algorithm for large-scale constrained optimization,

P. E. Gill, W. Murray, and M. A. Saunders, “Snopt: An sqp algorithm for large-scale constrained optimization,”SIAM review, vol. 47, no. 1, pp. 99–131, 2005

work page 2005
[40]

Symmetric stair preconditioning of linear sys- tems for parallel trajectory optimization,

X. Bu and B. Plancher, “Symmetric stair preconditioning of linear sys- tems for parallel trajectory optimization,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 9779–9786

work page 2024
[41]

Osqp: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “Osqp: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020
[42]

The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” in2019 IEEE/SICE International Symposium on System Integration (SII). IEEE, 2019, pp. 614–619

work page 2019
[43]

High- frequency nonlinear model predictive control of a manipulator,

S. Kleff, A. Meduri, R. Budhiraja, N. Mansard, and L. Righetti, “High- frequency nonlinear model predictive control of a manipulator,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 7330–7336

work page 2021
[44]

Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization

M. K. Transtrum and J. P. Sethna, “Improvements to the levenberg- marquardt algorithm for nonlinear least-squares minimization,” 2012. [Online]. Available: https://arxiv.org/abs/1201.5885

work page internal anchor Pith review Pith/arXiv arXiv 2012
[45]

Predictive sampling: Real-time behaviour synthesis with mujoco,

T. Howell, N. Gileadi, S. Tunyasuvunakool, K. Zakka, T. Erez, and Y . Tassa, “Predictive sampling: Real-time behaviour synthesis with mujoco,”arXiv preprint arXiv:2212.00541, 2022

work page arXiv 2022
[46]

Bundled gradients through contact via randomized smoothing,

H. J. T. Suh, T. Pang, and R. Tedrake, “Bundled gradients through contact via randomized smoothing,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4000–4007, 2022

work page 2022
[47]

Cacto-sl: Using sobolev learning to improve continuous actor-critic with trajectory optimization,

E. Alboni, G. Grandesso, G. P. R. Papini, J. Carpentier, and A. Del Prete, “Cacto-sl: Using sobolev learning to improve continuous actor-critic with trajectory optimization,” in6th Annual Learning for Dynamics & Control Conference. PMLR, 2024, pp. 1452–1463

work page 2024
[48]

Warm start of mixed-integer programs for model predictive control of hybrid systems,

T. Marcucci and R. Tedrake, “Warm start of mixed-integer programs for model predictive control of hybrid systems,”IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2433–2448, 2020

work page 2020
[49]

Nvidia orin system-on-chip,

M. Ditty, “Nvidia orin system-on-chip,” in2022 IEEE Hot Chips 34 Symposium (HCS). IEEE Computer Society, 2022, pp. 1–17

work page 2022

[1] [1]

Reactive planar manipula- tion with convex hybrid mpc,

F. R. Hogan, E. R. Grau, and A. Rodriguez, “Reactive planar manipula- tion with convex hybrid mpc,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 247–253

work page 2018

[2] [2]

A unified mpc framework for whole-body dynamic locomotion and manipula- tion,

J.-P. Sleiman, F. Farshidian, M. V . Minniti, and M. Hutter, “A unified mpc framework for whole-body dynamic locomotion and manipula- tion,”IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4688– 4695, 2021

work page 2021

[3] [3]

Cerberus in the darpa subterranean challenge,

M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart,et al., “Cerberus in the darpa subterranean challenge,”Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

work page 2022

[4] [4]

Optimization-based control for dynamic legged robots,

P. M. Wensing, M. Posa, Y . Hu, A. Escande, N. Mansard, and A. Del Prete, “Optimization-based control for dynamic legged robots,” IEEE Transactions on Robotics, 2023

work page 2023

[5] [5]

Taskable agility: Making useful dynamic behavior easier to create,

S. Kuindersma, “Taskable agility: Making useful dynamic behavior easier to create,” Princeton Robotics Seminar, April 2023

work page 2023

[6] [6]

J. T. Betts,Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010

work page 2010

[7] [7]

Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,

S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Per- menter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,”Autonomous robots, vol. 40, pp. 429–455, 2016

work page 2016

[8] [8]

Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control,

H. Li and P. M. Wensing, “Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control,” arXiv preprint arXiv:2403.03995, 2024

work page arXiv 2024

[9] [9]

Tinympc: Model-predictive control on resource-constrained micro- controllers,

K. Nguyen, S. Schoedel, A. Alavilli, B. Plancher, and Z. Manchester, “Tinympc: Model-predictive control on resource-constrained micro- controllers,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 1–7

work page 2024

[10] [10]

Model predictive path integral control: From theory to parallel computation,

G. Williams, A. Aldrich, and E. A. Theodorou, “Model predictive path integral control: From theory to parallel computation,”Journal of Guidance, Control, and Dynamics, vol. 40, no. 2, pp. 344–357, 2017

work page 2017

[11] [11]

Mppi- generic: A cuda library for stochastic trajectory optimization,

B. Vlahov, J. Gibson, M. Gandhi, and E. A. Theodorou, “Mppi- generic: A cuda library for stochastic trajectory optimization,”arXiv preprint arXiv:2409.07563, 2024

work page arXiv 2024

[12] [12]

Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing,

H. Xue, C. Pan, Z. Yi, G. Qu, and G. Shi, “Full-order sampling-based mpc for torque-level locomotion control via diffusion-style annealing,” arXiv preprint arXiv:2409.15610, 2024

work page arXiv 2024

[13] [13]

Real-time whole-body control of legged robots with model- predictive path integral control,

J. Alvarez-Padilla, J. Z. Zhang, S. Kwok, J. M. Dolan, and Z. Manch- ester, “Real-time whole-body control of legged robots with model- predictive path integral control,”arXiv preprint arXiv:2409.10469, 2024

work page arXiv 2024

[14] [14]

Comparison of nmpc and gpu- parallelized mppi for real-time uav control on embedded hardware,

R. Enrico, M. Mancini, and E. Capello, “Comparison of nmpc and gpu- parallelized mppi for real-time uav control on embedded hardware,” Applied Sciences, vol. 15, no. 16, p. 9114, 2025

work page 2025

[15] [15]

A performance analysis of parallel differential dynamic programming on a gpu,

B. Plancher and S. Kuindersma, “A performance analysis of parallel differential dynamic programming on a gpu,” inProceedings of the 13th Workshop on the Algorithmic F oundations of Robotics. Springer, 2018, pp. 656–672

work page 2018

[16] [16]

Gpu-based contact-aware trajectory optimization using a smooth force model,

Z. Pan, B. Ren, and D. Manocha, “Gpu-based contact-aware trajectory optimization using a smooth force model,” inProceedings of the 18th annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2019, pp. 1–12

work page 2019

[17] [17]

Gpu-parallelized iterative lqr with input constraints for fast collision avoidance of autonomous vehicles,

Y . Lee, M. Cho, and K.-S. Kim, “Gpu-parallelized iterative lqr with input constraints for fast collision avoidance of autonomous vehicles,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 4797–4804

work page 2022

[18] [18]

Exploit- ing gpu/simd architectures for solving linear-quadratic mpc problems,

D. Cole, S. Shin, F. Pacaud, V . M. Zavala, and M. Anitescu, “Exploit- ing gpu/simd architectures for solving linear-quadratic mpc problems,” in2023 American Control Conference (ACC). IEEE, 2023, pp. 3995– 4000

work page 2023

[19] [19]

Accelerating Optimal Power Flow with GPUs: SIMD Abstraction of Nonlinear Programs and Condensed-Space Interior-Point Methods

S. Shin, F. Pacaud, and M. Anitescu, “Accelerating optimal power flow with gpus: Simd abstraction of nonlinear programs and condensed- space interior-point methods,”arXiv preprint arXiv:2307.16830, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Curobo: Parallelized collision-free robot motion generation,

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos,et al., “Curobo: Parallelized collision-free robot motion generation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8112–8119

work page 2023

[21] [21]

Mpcgpu: Real-time nonlinear model predictive control through preconditioned conjugate gradient on the gpu,

E. Adabag, M. Atal, W. Gerard, and B. Plancher, “Mpcgpu: Real-time nonlinear model predictive control through preconditioned conjugate gradient on the gpu,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 9787–9794

work page 2024

[22] [22]

Gpu-enabled parallel trajectory optimization framework for safe motion planning of autonomous vehicles,

Y . Lee, K. H. Choi, and K.-S. Kim, “Gpu-enabled parallel trajectory optimization framework for safe motion planning of autonomous vehicles,”IEEE Robotics and Automation Letters, 2024

work page 2024

[23] [23]

Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,

S. H. Jeon, S. Hong, H. J. Lee, C. Khazoom, and S. Kim, “Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,”IEEE Robotics and Automation Letters, 2024

work page 2024

[24] [24]

Relu-qp: A gpu-accelerated quadratic programming solver for model-predictive control,

A. L. Bishop, J. Z. Zhang, S. Gurumurthy, K. Tracy, and Z. Manch- ester, “Relu-qp: A gpu-accelerated quadratic programming solver for model-predictive control,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 13 285–13 292

work page 2024

[25] [25]

On the differentiability of the primal- dual interior-point method,

K. Tracy and Z. Manchester, “On the differentiability of the primal- dual interior-point method,”arXiv preprint arXiv:2406.11749, 2024

work page arXiv 2024

[26] [26]

Primal-dual ilqr for gpu-accelerated learning and control in legged robots,

L. Amatucci, J. Sousa-Pinto, G. Turrisi, D. Orban, V . Barasuol, and C. Semini, “Primal-dual ilqr for gpu-accelerated learning and control in legged robots,”arXiv preprint arXiv:2506.07823, 2025

work page arXiv 2025

[27] [27]

Incomplete-lu and cholesky preconditioned iterative methods using cusparse and cublas,

M. Naumov, “Incomplete-lu and cholesky preconditioned iterative methods using cusparse and cublas,”Nvidia white paper, vol. 3, 2011

work page 2011

[28] [28]

Gpu acceleration of admm for large-scale quadratic programming,

M. Schubiger, G. Banjac, and J. Lygeros, “Gpu acceleration of admm for large-scale quadratic programming,”Journal of Parallel and Distributed Computing, vol. 144, pp. 55–67, 2020

work page 2020

[29] [29]

Accelerating robot dynamics gradients on a cpu, gpu, and fpga,

B. Plancher, S. M. Neuman, T. Bourgeat, S. Kuindersma, S. Devadas, and V . J. Reddi, “Accelerating robot dynamics gradients on a cpu, gpu, and fpga,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2335–2342, 2021

work page 2021

[30] [30]

Grid: Gpu-accelerated rigid body dynamics with analytical gradients,

B. Plancher, S. M. Neuman, R. Ghosal, S. Kuindersma, and V . J. Reddi, “Grid: Gpu-accelerated rigid body dynamics with analytical gradients,” in2022 International Conference on Robotics and Automa- tion (ICRA). IEEE, 2022, pp. 6253–6260

work page 2022

[31] [31]

Accelerating condensed interior-point methods on simd/gpu architec- tures,

F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on simd/gpu architec- tures,”Journal of Optimization Theory and Applications, pp. 1–20, 2023

work page 2023

[32] [32]

Fast generation of collision- free trajectories for robot swarms using gpu acceleration,

M. Hamer, L. Widmer, and R. D’andrea, “Fast generation of collision- free trajectories for robot swarms using gpu acceleration,”IEEE Access, vol. 7, pp. 6679–6690, 2018

work page 2018

[33] [33]

Fast joint multi-robot trajectory optimization by gpu accelerated batch solution of distributed sub-problems,

D. Guhathakurta, F. Rastgar, M. A. Sharma, K. M. Krishna, and A. K. Singh, “Fast joint multi-robot trajectory optimization by gpu accelerated batch solution of distributed sub-problems,”Frontiers in Robotics and AI, vol. 9, p. 890385, 2022

work page 2022

[34] [34]

Gpu accelerated batch trajectory optimization for autonomous navi- gation,

F. Rastgar, H. Masnavi, K. Kruusam ¨ae, A. Aabloo, and A. K. Singh, “Gpu accelerated batch trajectory optimization for autonomous navi- gation,” in2023 American Control Conference (ACC). IEEE, 2023, pp. 718–725

work page 2023

[35] [35]

Gait optimization for legged systems through mixed distribution cross-entropy optimization,

I. Tsikelis and K. Chatzilygeroudis, “Gait optimization for legged systems through mixed distribution cross-entropy optimization,” in 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids). IEEE, 2024, pp. 1011–1018

work page 2024

[36] [36]

Risk-averse model predictive control for racing in adverse conditions,

T. Lew, M. Greiff, F. Djeumou, M. Suminaka, M. Thompson, and J. Subosits, “Risk-averse model predictive control for racing in adverse conditions,”arXiv preprint arXiv:2410.17183, 2024

work page arXiv 2024

[37] [37]

Nocedal and S

J. Nocedal and S. J. Wright,Numerical optimization. Springer, 1999

work page 1999

[38] [38]

On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,

A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,”Mathematical programming, vol. 106, pp. 25–57, 2006

work page 2006

[39] [39]

Snopt: An sqp algorithm for large-scale constrained optimization,

P. E. Gill, W. Murray, and M. A. Saunders, “Snopt: An sqp algorithm for large-scale constrained optimization,”SIAM review, vol. 47, no. 1, pp. 99–131, 2005

work page 2005

[40] [40]

Symmetric stair preconditioning of linear sys- tems for parallel trajectory optimization,

X. Bu and B. Plancher, “Symmetric stair preconditioning of linear sys- tems for parallel trajectory optimization,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 9779–9786

work page 2024

[41] [41]

Osqp: An operator splitting solver for quadratic programs,

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “Osqp: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020

work page 2020

[42] [42]

The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” in2019 IEEE/SICE International Symposium on System Integration (SII). IEEE, 2019, pp. 614–619

work page 2019

[43] [43]

High- frequency nonlinear model predictive control of a manipulator,

S. Kleff, A. Meduri, R. Budhiraja, N. Mansard, and L. Righetti, “High- frequency nonlinear model predictive control of a manipulator,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 7330–7336

work page 2021

[44] [44]

Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization

M. K. Transtrum and J. P. Sethna, “Improvements to the levenberg- marquardt algorithm for nonlinear least-squares minimization,” 2012. [Online]. Available: https://arxiv.org/abs/1201.5885

work page internal anchor Pith review Pith/arXiv arXiv 2012

[45] [45]

Predictive sampling: Real-time behaviour synthesis with mujoco,

T. Howell, N. Gileadi, S. Tunyasuvunakool, K. Zakka, T. Erez, and Y . Tassa, “Predictive sampling: Real-time behaviour synthesis with mujoco,”arXiv preprint arXiv:2212.00541, 2022

work page arXiv 2022

[46] [46]

Bundled gradients through contact via randomized smoothing,

H. J. T. Suh, T. Pang, and R. Tedrake, “Bundled gradients through contact via randomized smoothing,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4000–4007, 2022

work page 2022

[47] [47]

Cacto-sl: Using sobolev learning to improve continuous actor-critic with trajectory optimization,

E. Alboni, G. Grandesso, G. P. R. Papini, J. Carpentier, and A. Del Prete, “Cacto-sl: Using sobolev learning to improve continuous actor-critic with trajectory optimization,” in6th Annual Learning for Dynamics & Control Conference. PMLR, 2024, pp. 1452–1463

work page 2024

[48] [48]

Warm start of mixed-integer programs for model predictive control of hybrid systems,

T. Marcucci and R. Tedrake, “Warm start of mixed-integer programs for model predictive control of hybrid systems,”IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2433–2448, 2020

work page 2020

[49] [49]

Nvidia orin system-on-chip,

M. Ditty, “Nvidia orin system-on-chip,” in2022 IEEE Hot Chips 34 Symposium (HCS). IEEE Computer Society, 2022, pp. 1–17

work page 2022