pith. sign in

arxiv: 2407.11107 · v3 · submitted 2024-07-15 · 💻 cs.RO · cs.LG

Latent Linear Quadratic Regulator for Robotic Control Tasks

Pith reviewed 2026-05-23 22:44 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords latent linear quadratic regulatormodel predictive controlrobotic controlimitation learningefficient controllatent dynamicsLQR
0
0 comments X

The pith

LaLQR learns a latent mapping that turns nonlinear robot dynamics linear and costs quadratic so standard LQR can replace slow MPC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to learn a hidden state representation in which the robot's motion follows linear equations and the task cost is quadratic. Once that representation is in place, the classic LQR feedback law produces near-optimal actions at low cost. The mapping is trained by making the latent LQR imitate the trajectories that the original nonlinear MPC would have produced. A reader who accepts the claim sees a route to real-time, high-performance control on hardware that cannot run full MPC at every step.

Core claim

A latent space exists in which the original nonlinear dynamics become linear and the original cost becomes quadratic; the parameters of this space are learned jointly so that the resulting LQR controller reproduces the closed-loop behavior of the full MPC while running orders of magnitude faster.

What carries the argument

A learned latent-space transformation that converts the original nonlinear dynamics into linear form and the original cost into quadratic form, allowing direct use of the LQR Riccati solution.

If this is right

  • Real-time control becomes feasible on embedded processors that cannot solve nonlinear programs at control frequency.
  • The same learned latent model can be reused across similar tasks without re-solving the original optimization each time.
  • Policy execution cost scales linearly with state dimension rather than with the complexity of the nonlinear optimizer.
  • Generalization to new initial conditions or mild environment changes improves because the latent LQR inherits the stability properties of the underlying linear system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same imitation objective could be applied to other optimal controllers besides MPC, turning any slow planner into a fast linear feedback law.
  • If the latent space is low-dimensional, the method might also serve as a model-reduction technique for long-horizon planning.
  • The learned latent coordinates could be inspected to discover which physical quantities the controller actually cares about.

Load-bearing premise

A latent mapping exists such that the transformed dynamics are exactly linear and the cost exactly quadratic, and imitation of MPC trajectories is enough to recover a useful version of that mapping.

What would settle it

On a held-out robotic task, collect trajectories from the original MPC and from LaLQR; if the latent dynamics deviate measurably from linearity or the closed-loop cost exceeds the MPC cost by more than a small margin while computation time remains lower, the claim is falsified.

Figures

Figures reproduced from arXiv: 2407.11107 by Colin Jones, Joschka Boedecker, Shaohui Yang, Toshiyuki Ohtsuka, Yuan Zhang.

Figure 1
Figure 1. Figure 1: Visualization of two types of dynamical models. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Training curve of eigen loss on [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of robots used in the experiments, with increased complexity. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Control process of methods learned from imperfect experts. The x-axis is the real testing [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Control process of methods starting from unseen initial states in training. The x-axis is the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Control process of LaLQR with different design choices. The x-axis is the real testing [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training curves of LaLQR on all tasks. The a-xis is the training time steps, and the y-axis [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes LaLQR, a method that learns a mapping from the original state space to a latent space in which the system dynamics are linear and the cost is quadratic, enabling direct application of the LQR controller. The latent mapping and associated linear/quadratic model are learned jointly by imitating trajectories from an external nonlinear MPC solver. Experiments on robotic control tasks are reported to show improved computational efficiency and generalization relative to baselines.

Significance. If the learned latent model satisfies the exact LQR structural assumptions on held-out data and new initial conditions, the approach could offer a computationally lighter alternative to repeated nonlinear MPC solves while retaining model-based guarantees, which would be relevant for real-time robotic applications.

major comments (2)
  1. [Method (imitation procedure)] The central claim requires that a latent mapping exists such that dynamics are exactly linear (x_{t+1}=A x_t + B u_t) and cost exactly quadratic. The described procedure learns the mapping solely by imitating MPC trajectories; no explicit loss term, regularization, or post-hoc verification enforces or confirms linearity/quadraticity in the latent space on held-out trajectories or new initial conditions. This directly affects whether the subsequent LQR solve is exact or merely approximate.
  2. [Abstract and Experiments] The abstract and experimental claims assert superior efficiency and generalization, yet the provided description supplies no equations for the latent dynamics, no training protocol details, no error bars, and no explicit comparison of closed-loop performance under the LQR assumptions versus the original MPC. Without these, the data cannot be assessed for support of the claim.
minor comments (2)
  1. [Method] Notation for the latent state, matrices A/B/Q/R, and the imitation objective should be introduced with explicit equations rather than prose description only.
  2. [Abstract] The abstract would benefit from a one-sentence statement of the imitation loss and the LQR application step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments on our manuscript. We address each of the major comments below, providing clarifications and indicating revisions where appropriate.

read point-by-point responses
  1. Referee: [Method (imitation procedure)] The central claim requires that a latent mapping exists such that dynamics are exactly linear (x_{t+1}=A x_t + B u_t) and cost exactly quadratic. The described procedure learns the mapping solely by imitating MPC trajectories; no explicit loss term, regularization, or post-hoc verification enforces or confirms linearity/quadraticity in the latent space on held-out trajectories or new initial conditions. This directly affects whether the subsequent LQR solve is exact or merely approximate.

    Authors: The LaLQR method parameterizes the latent dynamics as a linear system by design, with the mapping from original to latent space learned jointly via imitation of MPC trajectories. This ensures that the learned model approximates the LQR structure. However, we agree that explicit verification of the linearity assumption on held-out data would strengthen the claims. We will add post-hoc analysis and error metrics to confirm the degree of linearity and quadraticity on new trajectories and initial conditions. revision: yes

  2. Referee: [Abstract and Experiments] The abstract and experimental claims assert superior efficiency and generalization, yet the provided description supplies no equations for the latent dynamics, no training protocol details, no error bars, and no explicit comparison of closed-loop performance under the LQR assumptions versus the original MPC. Without these, the data cannot be assessed for support of the claim.

    Authors: The full manuscript provides the equations for the latent dynamics in Section 3, details the training protocol in Section 4, and includes experimental comparisons. That said, we acknowledge the abstract is brief and that error bars and more explicit closed-loop comparisons could be better emphasized. We will revise the experiments section to include error bars, add a direct comparison of closed-loop performance, and consider expanding the abstract if permitted. revision: partial

Circularity Check

0 steps flagged

No circularity: parameterized imitation learning with explicit LQ structure

full rationale

The paper defines LaLQR by jointly learning an encoder to a latent space together with linear dynamics and quadratic costs, using imitation of MPC trajectories as the training objective. The linear-quadratic structure is imposed directly by the model parameterization (not discovered or fitted post-hoc), and the resulting LQR controller is applied exactly on that structure. Performance claims rest on experimental comparisons of efficiency and generalization rather than any reduction of outputs to training inputs by definition. No self-citations, uniqueness theorems, or ansatzes are load-bearing; the derivation chain is self-contained as a standard behavioral cloning setup with architectural constraints.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities with independent evidence are stated in the provided text.

invented entities (1)
  • latent space no independent evidence
    purpose: Transform state space so dynamics are linear and cost quadratic
    The latent space is introduced as the key modeling choice that enables LQR; no independent evidence outside the learned imitation is described.

pith-pipeline@v0.9.0 · 5637 in / 1095 out tokens · 23313 ms · 2026-05-23T22:44:35.780340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parametric Nonconvex Optimization via Convex Surrogates

    math.OC 2026-04 unverdicted novelty 6.0

    A surrogate for parametric nonconvex optimization is constructed as the minimum of convex-monotonic function compositions and solved via parallel convex optimization, with a proof-of-concept on path tracking.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Di Carlo, P

    J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , pages 1–9, Madrid, Oct. 2018. IEEE. ISBN 978-1-5386-8094-0. doi:10.1109/IROS.2018.8594448

  2. [2]

    Grandia, F

    R. Grandia, F. Jenelten, S. Yang, F. Farshidian, and M. Hutter. Perceptive locomotion through nonlinear model-predictive control. IEEE Trans. Robotics , 39(5):3402–3421, 2023. doi:10. 1109/TRO.2023.3275384

  3. [3]

    TartanAir: A dataset to push the limits of visual SLAM,

    Y . Song and D. Scaramuzza. Learning high-level policies for model predictive control. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 7629–7636, Oct. 2020. doi:10.1109/IROS45743.2020.9340823

  4. [4]

    Diehl, H

    M. Diehl, H. J. Ferreau, and N. Haverbeke. Efficient numerical methods for nonlinear mpc and moving horizon estimation. In M. Morari, M. Thoma, L. Magni, D. M. Raimondo, and F. Allg ¨ower, editors, Nonlinear Model Predictive Control , volume 384, pages 391–417. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. ISBN 978-3-642-01093-4 978-3-642- 01094-1

  5. [5]

    Nocedal and S

    J. Nocedal and S. J. Wright. Sequential quadratic programming. In Numerical Optimization, pages 526–573. Springer-Verlag, New York, 1999. ISBN 978-0-387-98793-4

  6. [6]

    Y .-S. Wang, N. Matni, and J. C. Doyle. Localized lqr optimal control. In53rd IEEE Conference on Decision and Control, pages 1661–1668. IEEE, 2014

  7. [7]

    B. D. Anderson and J. B. Moore. Optimal Control: Linear Quadratic Methods . Courier Corporation, 2007

  8. [8]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Manju- nath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsc...

  9. [9]

    Korda and I

    M. Korda and I. Mezi ´c. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018

  10. [10]

    P. J. Antsaklis and A. N. Michel. A Linear Systems Primer . Springer Science & Business Media, 2007

  11. [11]

    Lusch, J

    B. Lusch, J. N. Kutz, and S. L. Brunton. Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications, 9(1):4950, Nov. 2018. ISSN 2041-1723. doi: 10.1038/s41467-018-07210-0

  12. [12]

    A. K. Mondal, S. S. Panigrahi, S. Rajeswar, K. Siddiqi, and S. Ravanbakhsh. Efficient dynam- ics modeling in interactive environments with koopman theory. In The Twelfth International Conference on Learning Representations, 2024. 10

  13. [13]

    Nolte, O

    N. Nolte, O. Kitouni, and M. Williams. Expressive monotonic neural networks. In The Eleventh International Conference on Learning Representations , Sept. 2022

  14. [14]

    Y . Tang, Z. D. Guo, P. H. Richemond, B. A. Pires, Y . Chandak, R. Munos, M. Rowland, M. G. Azar, C. L. Lan, C. Lyle, A. Gy ¨orgy, S. Thakoor, W. Dabney, B. Piot, D. Calandriello, and M. Valko. Understanding self-predictive learning for reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning , pages 33632–33656. PML...

  15. [15]

    Mujoco: A physics en- gine for model-based control, in: 2012 IEEE/RSJ International Con- ference on Intelligent Robots and Systems, IEEE

    E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 5026– 5033, Vilamoura-Algarve, Portugal, Oct. 2012. IEEE. ISBN 978-1-4673-1736-8 978-1-4673- 1737-5 978-1-4673-1735-1. doi:10.1109/IROS.2012.6386109

  16. [16]

    Tassa, T

    Y . Tassa, T. Erez, and E. Todorov. Synthesis and stabilization of complex behaviors through on- line trajectory optimization. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4906–4913, Oct. 2012. doi:10.1109/IROS.2012.6386025

  17. [17]

    M. O. Williams, I. G. Kevrekidis, and C. W. Rowley. A data–driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science , 25(6):1307–1346, Dec. 2015. ISSN 0938-8974, 1432-1467. doi:10.1007/s00332-015-9258-5

  18. [18]

    Dickerson

    D. Gadginmath, V . Krishnan, and F. Pasqualetti. Data-driven feedback linearization using the koopman generator. CoRR, abs/2210.05046, 2022. doi:10.48550/ARXIV .2210.05046

  19. [19]

    Watter, J

    M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

  20. [20]

    Dream to Control: Learning Behaviors by Latent Imagination

    D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. arXiv:1912.01603 [cs], Mar. 2020

  21. [21]

    N. A. Hansen, H. Su, and X. Wang. Temporal difference learning for model predictive control. In Proceedings of the 39th International Conference on Machine Learning , pages 8387–8406. PMLR, June 2022

  22. [22]

    H. Yin, M. C. Welle, and D. Kragic. Embedding koopman optimal control in robot pol- icy learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13392–13399, Kyoto, Japan, Oct. 2022. IEEE. ISBN 978-1-66547-927-1. doi: 10.1109/IROS47612.2022.9981540

  23. [23]

    Retchin, B

    M. Retchin, B. Amos, S. Brunton, and S. Song. Koopman constrained policy optimization: A koopman operator theoretic method for differentiable optimal control in robotics. In ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, Sept. 2023

  24. [24]

    T. Ni, B. Eysenbach, E. SeyedSalehi, M. Ma, C. Gehring, A. Mahajan, and P.-L. Bacon. Bridg- ing state and history representations: Understanding self-predictive rl. In The Twelfth Interna- tional Conference on Learning Representations , Oct. 2023

  25. [25]

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3803–3810, May 2018. doi:10.1109/ICRA.2018.8460528

  26. [26]

    Nguyen, S

    K. Nguyen, S. Schoedel, A. Alavilli, B. Plancher, and Z. Manchester. Tinympc: Model- predictive control on resource-constrained microcontrollers. InIEEE International Conference on Robotics and Automation (ICRA) , 2024. 11

  27. [27]

    Howell, N

    T. Howell, N. Gileadi, S. Tunyasuvunakool, K. Zakka, T. Erez, and Y . Tassa. Predictive sam- pling: Real-time behaviour synthesis with mujoco, Dec. 2022

  28. [28]

    Loshchilov and F

    I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Confer- ence on Learning Representations, Sept. 2018. 12 A Additional Algorithm Details Algorithm 1 MPC Imitation Learning 1: Input: Nonlinear MPC’s dynamical modelf(xh, uh), cost function c(xh, uh) and optimization algorithm to generate optimal control uh = MPC(xh), init...