pith. sign in

arxiv: 2606.26341 · v1 · pith:QQ2SOPT4new · submitted 2026-06-24 · 💻 cs.RO

Scaling Nonlinear Optimization: Many Problems One GPU

Pith reviewed 2026-06-26 01:42 UTC · model grok-4.3

classification 💻 cs.RO
keywords jaxipmGPU-batched NLP solverIPOPTnonlinear model predictive controltrajectory optimizationJAX implementationrobotics optimization
0
0 comments X

The pith

A JAX-based redesign of IPOPT solves batches of nonlinear programs on one GPU at up to 32.85 times the throughput of the CPU version.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces jaxipm, the first GPU-batched nonlinear program solver derived from IPOPT. It achieves this by redesigning the algorithm with heterogeneous iteration fusion to remove control flow and iteration-level batching to keep the GPU occupied across many problems at once. This addresses the mismatch in robotics where mature constrained solvers like IPOPT handle one problem at a time on CPUs while sampling-based methods scale to thousands of rollouts on GPUs. A reader would care because the approach would let gradient-based planners produce far more locally optimal, constraint-satisfying trajectories per second inside modern batched learning pipelines.

Core claim

jaxipm is the first GPU-batched NLP solver based on IPOPT and implemented in JAX. The redesign eliminates control flow via heterogeneous iteration fusion and minimizes idle time via iteration-level batching. On quadrotor nonlinear model predictive control benchmarks that include reference tracking with obstacles, multi-agent collision-free navigation, and cluttered environments, it delivers up to 32.85 times higher throughput than standard IPOPT.

What carries the argument

Heterogeneous iteration fusion combined with iteration-level batching, which adapts IPOPT's interior-point iterations to execute without branching and with synchronized processing across problem batches on GPU.

If this is right

  • NLP solvers can move from single-problem CPU execution to concurrent solution of many problems on one GPU.
  • Constrained optimization becomes compatible with GPU-batched simulators and learning frameworks in robotics.
  • The rate at which locally optimal trajectories are generated for tasks such as model predictive control increases by more than an order of magnitude.
  • Hard constraint satisfaction and optimality properties carry over from the original IPOPT algorithm to the batched GPU setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion and batching techniques could be applied to other sequential optimization codes to achieve similar GPU scaling.
  • Hybrid pipelines that interleave batched optimal control with large numbers of sampling-based rollouts become practical on a single accelerator.
  • The method invites direct tests on higher-dimensional state spaces or contact-rich problems to measure where the batching overhead begins to dominate.

Load-bearing premise

The algorithmic changes for batching on GPU preserve IPOPT's convergence behavior, constraint satisfaction, and local optimality guarantees when the solver runs in batched form.

What would settle it

A set of benchmark NLPs on which standard IPOPT returns feasible locally optimal solutions but the batched jaxipm version returns infeasible points or fails to converge.

Figures

Figures reproduced from arXiv: 2606.26341 by Johanna Haffner, John Viljoen, Masayoshi Tomizuka, Negar Mehr.

Figure 1
Figure 1. Figure 1: We summarize jaxipm: each color represents a different optimization problem being executed in batch. At each iteration, every problem can take different paths, including but not limited to: backtracking line search, second order correction, and feasibility restoration. Each of these heterogeneous paths shares some common quantities in their calculation, which we compute in parallel, followed by the remaini… view at source ↗
Figure 2
Figure 2. Figure 2: We visualize the throughput advantages of iteration [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: We visualize the multi-quadrotor consensus task [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: We visualize the ECDF of the step-wise discrepancies [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Many robotics problems, including trajectory optimization, inverse kinematics, and contact-rich motion planning, reduce to nonlinear programs (NLPs). Mature NLP solvers such as IPOPT can solve these problems, offering hard constraint satisfaction, optimality guarantees, and favorable scaling with problem dimension. These solvers underpin gradient-based methods in robotics, yet remain CPU-bound and solve only one problem at a time, preventing their integration into GPU-batched learning pipelines. On the other hand, sampling-based approaches such as reinforcement learning, model predictive path integral, and imitation learning have become the core of modern robotics research due to their ability to leverage GPU-batched simulators. These simulators can generate orders of magnitude more dynamics rollouts per second than was previously possible. If a GPU-batched NLP solver existed, it would unlock similar speedups in the number of constrained, locally optimal solutions generated per second. This regime of solving many problems concurrently versus solving a single problem at a time is a key requirement for integrating NLP solvers in modern GPU-batched robotics frameworks. To this end, we introduce \texttt{jaxipm}, the first GPU-batched NLP solver, based on IPOPT, and implemented in JAX. We accomplish this by redesigning IPOPT's algorithm to eliminate control flow with \textit{heterogeneous iteration fusion}, and by minimizing GPU idle time with \textit{iteration level batching}. We evaluate \texttt{jaxipm} on a variety of quadrotor nonlinear model predictive control benchmarks, including reference tracking in the presence of obstacles, multi-quadrotor navigation without collision, and navigation in a cluttered environment. We demonstrate up to a $32.85\times$ increase in throughput over IPOPT. Our complete open-source codebase is available at https://github.com/johnviljoen/jaxipm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces jaxipm, the first GPU-batched NLP solver based on IPOPT and implemented in JAX. It redesigns the algorithm using heterogeneous iteration fusion to eliminate control flow and iteration-level batching to minimize GPU idle time. The method is evaluated on quadrotor NMPC benchmarks (reference tracking with obstacles, multi-quadrotor navigation, cluttered environments), reporting up to 32.85× throughput gains over sequential IPOPT, with an open-source codebase provided.

Significance. If the batched solver is shown to preserve IPOPT convergence, KKT satisfaction, and local optimality, the work would bridge CPU-based constrained optimization with GPU-batched robotics pipelines, enabling orders-of-magnitude more locally optimal solutions per second in learning and planning frameworks. The open-source release at https://github.com/johnviljoen/jaxipm is a clear strength for reproducibility.

major comments (1)
  1. [Algorithmic redesign (heterogeneous iteration fusion and iteration-level batching sections)] The central empirical claim (up to 32.85× throughput) depends on heterogeneous iteration fusion and iteration-level batching producing mathematically equivalent behavior to sequential IPOPT, including identical adaptive barrier updates, filter line search decisions, inertia correction, and restoration-phase logic per problem. No explicit verification (e.g., per-instance KKT residual matching, iteration-count equivalence, or constraint violation statistics) is reported on the quadrotor benchmarks to confirm that the fused/batched execution solves the same NLPs.
minor comments (1)
  1. [Abstract and evaluation section] The abstract and evaluation description would benefit from stating the exact problem dimensions, number of concurrent NLPs, and hardware used for the throughput measurements to allow direct comparison with future work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment regarding verification of algorithmic equivalence below.

read point-by-point responses
  1. Referee: [Algorithmic redesign (heterogeneous iteration fusion and iteration-level batching sections)] The central empirical claim (up to 32.85× throughput) depends on heterogeneous iteration fusion and iteration-level batching producing mathematically equivalent behavior to sequential IPOPT, including identical adaptive barrier updates, filter line search decisions, inertia correction, and restoration-phase logic per problem. No explicit verification (e.g., per-instance KKT residual matching, iteration-count equivalence, or constraint violation statistics) is reported on the quadrotor benchmarks to confirm that the fused/batched execution solves the same NLPs.

    Authors: We concur that confirming mathematical equivalence is critical for validating the throughput claims. Our redesign using heterogeneous iteration fusion and iteration-level batching aims to preserve the exact per-problem behavior of IPOPT by replacing control flow with data-dependent masking and ensuring identical barrier updates, line search decisions, inertia corrections, and restoration logic. While the manuscript focuses on the throughput results assuming this equivalence by construction, we did not include explicit per-instance verification metrics. We will revise the manuscript to include such verification, for example by reporting average and maximum KKT residual differences, iteration count matches, and constraint violation statistics across the benchmark instances. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical implementation and throughput claims are self-contained

full rationale

The paper introduces jaxipm as a JAX-based GPU-batched reimplementation of IPOPT, achieved via heterogeneous iteration fusion and iteration-level batching. Its central claims consist of empirical throughput measurements (up to 32.85×) on quadrotor NMPC benchmarks, with no mathematical derivation chain, fitted parameters renamed as predictions, or self-referential equations. The algorithmic redesigns are presented as engineering modifications whose equivalence to sequential IPOPT is asserted but not derived from prior results within the paper; verification is external via open-source code and direct benchmarking. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in the provided text. The work is therefore an independent implementation contribution whose performance numbers do not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the established convergence properties of IPOPT's interior-point method and on JAX's ability to execute batched array operations on GPU; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption IPOPT's interior-point algorithm converges to locally optimal solutions satisfying constraints under standard NLP assumptions
    The paper builds directly on IPOPT without re-deriving its theoretical properties.

pith-pipeline@v0.9.1-grok · 5866 in / 1193 out tokens · 23947 ms · 2026-06-26T01:42:50.736009+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 2 linked inside Pith

  1. [1]

    An introduction to trajectory optimization: How to do your own direct collocation,

    M. Kelly, “An introduction to trajectory optimization: How to do your own direct collocation,”SIAM review, vol. 59, no. 4, pp. 849–904, 2017

  2. [2]

    Nonlinear model pre- dictive control: From theory to application,

    F. Allgower, R. Findeisen, Z. K. Nagyet al., “Nonlinear model pre- dictive control: From theory to application,”Journal-Chinese Institute Of Chemical Engineers, vol. 35, no. 3, pp. 299–316, 2004

  3. [3]

    An experimental evaluation of a novel minimum-jerk cartesian controller for humanoid robots,

    U. Pattacini, F. Nori, L. Natale, G. Metta, and G. Sandini, “An experimental evaluation of a novel minimum-jerk cartesian controller for humanoid robots,” in2010 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2010, pp. 1668–1674

  4. [4]

    A direct method for trajectory op- timization of rigid bodies through contact,

    M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory op- timization of rigid bodies through contact,”The International Journal of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014

  5. [5]

    Model predictive path integral control: From theory to parallel computation,

    G. Williams, A. Aldrich, and E. A. Theodorou, “Model predictive path integral control: From theory to parallel computation,”Journal of Guidance, Control, and Dynamics, vol. 40, no. 2, pp. 344–357, 2017

  6. [6]

    A tutorial on the cross-entropy method,

    P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of operations research, vol. 134, no. 1, pp. 19–67, 2005

  7. [7]

    On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,

    A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,”Mathematical programming, vol. 106, pp. 25–57, 2006

  8. [8]

    Knitro: An integrated pack- age for nonlinear optimization,

    R. H. Byrd, J. Nocedal, and R. A. Waltz, “Knitro: An integrated pack- age for nonlinear optimization,” inLarge-scale nonlinear optimization. Springer, 2006, pp. 35–59

  9. [9]

    Snopt: An sqp algorithm for large-scale constrained optimization,

    P. E. Gill, W. Murray, and M. A. Saunders, “Snopt: An sqp algorithm for large-scale constrained optimization,”SIAM review, vol. 47, no. 1, pp. 99–131, 2005

  10. [10]

    MuJoCo XLA (MJX),

    MuJoCo XLA Authors, “MuJoCo XLA (MJX),” https://mujoco. readthedocs.io/en/stable/mjx.html, 2024, accessed: March 25, 2026

  11. [11]

    Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,

    M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Munoz, X. Yao, R. Zurbr ¨ugg, N. Rudinet al., “Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,” arXiv preprint arXiv:2511.04831, 2025

  12. [12]

    Brax–a differentiable physics engine for large scale rigid body simulation,

    C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax–a differentiable physics engine for large scale rigid body simulation,”arXiv preprint arXiv:2106.13281, 2021

  13. [13]

    Synthesis of model predictive control and reinforcement learning: Survey and classifica- tion,

    R. Reiter, J. Hoffmann, D. Reinhardt, F. Messerer, K. Baumg ¨artner, S. Sawant, J. Boedecker, M. Diehl, and S. Gros, “Synthesis of model predictive control and reinforcement learning: Survey and classifica- tion,”Annual Reviews in Control, vol. 61, p. 101045, 2026

  14. [14]

    Towards out-of-distribution generalization: A survey,

    J. Liu, Z. Shen, Y . He, X. Zhang, R. Xu, H. Yu, and P. Cui, “Towards out-of-distribution generalization: A survey,”arXiv preprint arXiv:2108.13624, 2021

  15. [15]

    Complementarity-free multi-contact modeling and optimiza- tion for dexterous manipulation,

    W. Jin, “Complementarity-free multi-contact modeling and optimiza- tion for dexterous manipulation,”arXiv preprint arXiv:2408.07855, 2024

  16. [16]

    3d dynamic walking with underactuated humanoid robots: A direct collocation framework for optimizing hybrid zero dynamics,

    A. Hereid, E. A. Cousineau, C. M. Hubicki, and A. D. Ames, “3d dynamic walking with underactuated humanoid robots: A direct collocation framework for optimizing hybrid zero dynamics,” in2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1447–1454

  17. [17]

    Convexification and real-time optimization for mpc with aerospace applications,

    Y . Mao, D. Dueri, M. Szmuk, and B. Ac ¸ıkmes ¸e, “Convexification and real-time optimization for mpc with aerospace applications,” in Handbook of model predictive control. Springer, 2018, pp. 335–358

  18. [18]

    Casadi—a software framework for nonlinear optimization and optimal control,

    J. Andersson, J. Gillis, G. Horn, J. Rawlings, and M. Diehl, “Casadi—a software framework for nonlinear optimization and optimal control,” Mathematical Programming Computation, vol. 11, no. 1, pp. 1–36, 2018

  19. [19]

    Drake: Model-based design and verification for robotics,

    R. Tedrake and the Drake Development Team, “Drake: Model-based design and verification for robotics,” 2019. [Online]. Available: https://drake.mit.edu

  20. [20]

    Pyomo: modeling and solving mathematical programs in python,

    W. E. Hart, J.-P. Watson, and D. L. Woodruff, “Pyomo: modeling and solving mathematical programs in python,”Mathematical Program- ming Computation, vol. 3, no. 3, pp. 219–260, 2011

  21. [21]

    JuMP 1.0: Recent improvements to a modeling language for mathematical optimization,

    M. Lubin, O. Dowson, J. Dias Garcia, J. Huchette, B. Legat, and J. P. Vielma, “JuMP 1.0: Recent improvements to a modeling language for mathematical optimization,”Mathematical Programming Compu- tation, 2023

  22. [22]

    JAX: composable transformations of Python+NumPy programs,

    J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman- Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” http://github.com/jax-ml/jax, 2018

  23. [23]

    Strategic decision-making in multi- agent domains: A weighted constrained potential dynamic game approach,

    M. Bhatt, Y . Jia, and N. Mehr, “Strategic decision-making in multi- agent domains: A weighted constrained potential dynamic game approach,”IEEE Transactions on Robotics, 2025

  24. [24]

    Hykkt: a hybrid direct-iterative method for solving kkt linear systems,

    S. Regev, N.-Y . Chiang, E. Darve, C. G. Petra, M. A. Saunders, K. ´Swirydowicz, and S. Peleˇs, “Hykkt: a hybrid direct-iterative method for solving kkt linear systems,”Optimization Methods and Software, vol. 38, no. 2, pp. 332–355, 2023

  25. [25]

    Gpu-accelerated dynamic nonlinear optimiza- tion with examodels and madnlp,

    F. Pacaud and S. Shin, “Gpu-accelerated dynamic nonlinear optimiza- tion with examodels and madnlp,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 5963–5968

  26. [26]

    Accelerating optimal power flow with gpus: Simd abstraction of nonlinear programs and condensed- space interior-point methods,

    S. Shin, M. Anitescu, and F. Pacaud, “Accelerating optimal power flow with gpus: Simd abstraction of nonlinear programs and condensed- space interior-point methods,”Electric Power Systems Research, vol. 236, p. 110651, 2024

  27. [27]

    NVIDIA cuDSS (Preview): A High-Performance CUDA Library for Direct Sparse Solvers,

    NVIDIA, “NVIDIA cuDSS (Preview): A High-Performance CUDA Library for Direct Sparse Solvers,” https://docs.nvidia.com/cuda/cudss/ index.html, accessed: 2025

  28. [28]

    Optnet: Differentiable optimization as a layer in neural networks,

    B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer in neural networks,”arXiv preprint arXiv:1703.00443, 2017

  29. [29]

    A differentiable interior-point method in single precision,

    J. Arrizabalaga, K. Tracy, and Z. Manchester, “A differentiable interior-point method in single precision,” 2026. [Online]. Available: https://arxiv.org/abs/2605.17913

  30. [30]

    Moreau: GPU-native differen- tiable optimization,

    S. Barratt, P. Nobel, and S. Diamond, “Moreau: GPU-native differen- tiable optimization,” https://moreau.so, 2026

  31. [31]

    Mpax: Mathematical programming in jax,

    H. Lu, Z. Peng, and J. Yang, “Mpax: Mathematical programming in jax,”arXiv preprint arXiv:2412.09734, 2024

  32. [32]

    Efficient and modular implicit differentiation,

    M. Blondel, Q. Berthet, M. Cuturi, R. Frostig, S. Hoyer, F. Llinares- L´opez, F. Pedregosa, and J.-P. Vert, “Efficient and modular implicit differentiation,”arXiv preprint arXiv:2105.15183, 2021

  33. [33]

    Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco-manipulation,

    F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xuet al., “Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco-manipulation,” IEEE Robotics and Automation Letters, 2025

  34. [34]

    Imitation learning from mpc for quadrupedal multi-gait control,

    A. Reske, J. Carius, Y . Ma, F. Farshidian, and M. Hutter, “Imitation learning from mpc for quadrupedal multi-gait control,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5014–5020

  35. [35]

    Good old-fashioned engineering can close the 100,000- year “data gap

    K. Goldberg, “Good old-fashioned engineering can close the 100,000- year “data gap” in robotics,” p. eaea7390, 2025

  36. [36]

    A trust region method based on interior point techniques for nonlinear programming,

    R. H. Byrd, J. C. Gilbert, and J. Nocedal, “A trust region method based on interior point techniques for nonlinear programming,”Math- ematical programming, vol. 89, no. 1, pp. 149–185, 2000

  37. [37]

    A primal- dual trust-region algorithm for non-convex nonlinear programming,

    A. R. Conn, N. I. Gould, D. Orban, and P. L. Toint, “A primal- dual trust-region algorithm for non-convex nonlinear programming,” Mathematical programming, vol. 87, no. 2, pp. 215–249, 2000

  38. [38]

    A. V . Fiacco and G. P. McCormick,Nonlinear programming: sequen- tial unconstrained minimization techniques. SIAM, 1990

  39. [39]

    Line search filter methods for nonlinear programming: Motivation and global convergence,

    A. W ¨achter and L. T. Biegler, “Line search filter methods for nonlinear programming: Motivation and global convergence,”SIAM Journal on Optimization, vol. 16, no. 1, pp. 1–31, 2005

  40. [40]

    Quadrotor kinematics and dynamics,

    C. Powers, D. Mellinger, and V . Kumar, “Quadrotor kinematics and dynamics,”Handbook of unmanned aerial vehicles, pp. 307–328, 2015