pith. sign in

arxiv: 2601.02738 · v1 · submitted 2026-01-06 · 💻 cs.RO · cs.SY· eess.SY

Optimizing Control-Friendly Trajectories with Self-Supervised Residual Learning

Pith reviewed 2026-05-16 17:41 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY
keywords residual learningself-supervised learningtrajectory optimizationhybrid dynamicsquadrotor controlagile flightclosed-loop effects
0
0 comments X

The pith

Self-supervised residual learning from trajectories enables optimizers to generate aggressive motions that controllers track precisely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that learns unknown closed-loop dynamic effects as additive residuals on nominal models, using only trajectory-level data in a self-supervised way that supplies analytic gradients. This hybrid model supports accurate long-horizon predictions at any integration step size. A trajectory optimizer then minimizes the size of these residuals along candidate paths, producing reference trajectories that are friendly to the downstream controller. The approach is shown on quadrotor agile flight where the resulting motions remain aggressive yet accurately executable. A reader would care because real robotic physics is always imperfect, and this method bridges the gap without requiring complete analytical models.

Core claim

By treating learned residuals as part of a hybrid dynamics model and optimizing trajectories to minimize residual effects along the path, the method produces aggressive reference trajectories that the closed-loop controller can track precisely, as illustrated by quadrotor experiments.

What carries the argument

The hybrid dynamics formed by nominal equations plus self-supervised residual terms, which the trajectory optimizer uses to minimize residual physics.

If this is right

  • Aggressive trajectories can be planned while respecting real closed-loop behavior.
  • Long-horizon predictions stay accurate with arbitrary integration step sizes.
  • Unknown dynamic effects are captured from trajectory data without full analytical modeling.
  • Direct minimization of residuals in planning improves downstream tracking performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same residual-minimization step could be applied to other underactuated platforms where model mismatch limits speed.
  • Periodic re-learning of residuals would allow the planner to adapt when conditions such as payload or wind change.
  • The method may reduce reliance on conservative safety margins in high-speed robotic motion planning.

Load-bearing premise

Residuals learned self-supervised from trajectory data accurately capture unknown closed-loop effects, and minimizing them in optimization improves tracking without creating new instabilities.

What would settle it

If trajectories optimized under the hybrid model are tracked with larger error than trajectories optimized under the nominal model alone, the claimed benefit would be falsified.

Figures

Figures reproduced from arXiv: 2601.02738 by Jindou Jia, Kexin Guo, Xiang Yu, Yuhang Liu, Zihan Yang.

Figure 1
Figure 1. Figure 1: The schematic of the quadrotor system with the definitions of the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed hybrid model. Given an open-loop dynamics and a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectory tracking results of the proposed trajectory optimization and the compared methods. Using a nominal MPC for trajectory tracking, the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trajectories generated with random waypoints for aerodynamic drag [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learning curves of 5 trials with random initial guesses around zero [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Positional tracking error of the MPC and DFBC controllers with different trajectories. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison on the time cost of different trajectory optimization [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Trajectories optimized using minimum-snap and minimum-residual objectives together with the camera view of the real-world flight. Tracking control [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The real-world positional tracking error of the DFBC controllers with [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Real-world physics can only be analytically modeled with a certain level of precision for modern intricate robotic systems. As a result, tracking aggressive trajectories accurately could be challenging due to the existence of residual physics during controller synthesis. This paper presents a self-supervised residual learning and trajectory optimization framework to address the aforementioned challenges. At first, unknown dynamic effects on the closed-loop model are learned and treated as residuals of the nominal dynamics, jointly forming a hybrid model. We show that learning with analytic gradients can be achieved using only trajectory-level data while enjoying accurate long-horizon prediction with an arbitrary integration step size. Subsequently, a trajectory optimizer is developed to compute the optimal reference trajectory with the residual physics along it minimized. It ends up with trajectories that are friendly to the following control level. The agile flight of quadrotors illustrates that by utilizing the hybrid dynamics, the proposed optimizer outputs aggressive motions that can be precisely tracked.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a self-supervised residual learning framework that augments a nominal dynamics model with learned residuals to form a hybrid model. Residuals are trained from trajectory-level data using analytic gradients to enable accurate long-horizon prediction at arbitrary step sizes. A trajectory optimizer then minimizes the residual effects along candidate paths, producing reference trajectories that are claimed to be more control-friendly. The approach is illustrated on quadrotor agile flight, where the hybrid model is said to enable aggressive motions that can be precisely tracked.

Significance. If the generalization and closed-loop claims hold, the method offers a practical route to trajectory optimization under unmodeled dynamics without requiring full analytic models or extensive system identification. The self-supervised training from trajectory data and analytic-gradient learning are strengths that could transfer to other robotic platforms.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'by utilizing the hybrid dynamics, the proposed optimizer outputs aggressive motions that can be precisely tracked' is unsupported by any quantitative tracking-error metrics, baseline comparisons, or closed-loop validation results in the provided text; without these, the improvement over nominal-model optimization cannot be assessed.
  2. [Method (hybrid model and optimizer)] The framework assumes residuals learned self-supervised from collected trajectories generalize to the new, more aggressive optimizer-generated paths; no out-of-distribution testing, cross-validation on held-out aggressive maneuvers, or stability analysis under the hybrid model is described, leaving the weakest assumption unaddressed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional quantitative support and validation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'by utilizing the hybrid dynamics, the proposed optimizer outputs aggressive motions that can be precisely tracked' is unsupported by any quantitative tracking-error metrics, baseline comparisons, or closed-loop validation results in the provided text; without these, the improvement over nominal-model optimization cannot be assessed.

    Authors: We agree that the abstract claim would be stronger with explicit quantitative backing. The manuscript presents closed-loop simulation and hardware results on quadrotors demonstrating reduced tracking errors for hybrid-model trajectories versus nominal-model baselines. We will revise the abstract to reference these metrics (e.g., average tracking error reductions) and clarify the baseline comparisons, ensuring the central claim is directly supported. revision: yes

  2. Referee: [Method (hybrid model and optimizer)] The framework assumes residuals learned self-supervised from collected trajectories generalize to the new, more aggressive optimizer-generated paths; no out-of-distribution testing, cross-validation on held-out aggressive maneuvers, or stability analysis under the hybrid model is described, leaving the weakest assumption unaddressed.

    Authors: The training data includes a range of aggressive maneuvers to support generalization of the residuals. We acknowledge that dedicated out-of-distribution testing and stability analysis are not explicitly presented. In the revision we will add cross-validation results on held-out aggressive trajectories and a short discussion of stability properties of the hybrid dynamics to address this directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper describes learning residuals self-supervised from trajectory-level data to form a hybrid dynamics model, followed by separate trajectory optimization that minimizes the residual term along candidate paths. This does not reduce any prediction or result to its inputs by construction: the residual function is fitted to observed data, and the optimizer searches over new trajectories using that fixed learned function. No equations, self-citations, or uniqueness claims are provided that would create a definitional loop or fitted-input-as-prediction pattern. The central claim therefore rests on empirical generalization rather than tautological equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the nominal dynamics serving as a usable base and the learned residuals being sufficient to improve optimization and tracking; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption A nominal analytic dynamics model exists and is accurate enough to serve as the base for learning residuals
    The paper states that unknown effects are treated as residuals of the nominal dynamics.
invented entities (1)
  • Hybrid model of nominal plus residual dynamics no independent evidence
    purpose: To enable accurate long-horizon prediction and residual-minimizing trajectory optimization
    Formed jointly from learned residuals and nominal model as described in the abstract.

pith-pipeline@v0.9.0 · 5466 in / 1250 out tokens · 52092 ms · 2026-05-16T17:41:44.981546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    Reach- ing the limit in autonomous racing: Optimal control versus reinforcement learning,

    Y . Song, A. Romero, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Reach- ing the limit in autonomous racing: Optimal control versus reinforcement learning,”Science Robotics, vol. 8, no. 82, p. eadg1462, 2023

  2. [2]

    A comparative study of nonlinear mpc and differential-flatness-based control for quadrotor agile flight,

    S. Sun, A. Romero, P. Foehn, E. Kaufmann, and D. Scaramuzza, “A comparative study of nonlinear mpc and differential-flatness-based control for quadrotor agile flight,”IEEE Transactions on Robotics, vol. 38, no. 6, pp. 3357–3373, 2022

  3. [3]

    EVOLVER: Online learning and prediction of disturbances for robot control,

    J. Jia, W. Zhang, K. Guo, J. Wang, X. Yu, Y . Shi, and L. Guo, “EVOLVER: Online learning and prediction of disturbances for robot control,”IEEE Transactions on Robotics, vol. 40, pp. 382–402, 2024

  4. [4]

    Safe learning- based control for multiple uavs under uncertain disturbances,

    M. Wei, L. Zheng, Y . Wu, H. Liu, and H. Cheng, “Safe learning- based control for multiple uavs under uncertain disturbances,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 4, pp. 7349–7362, 2024

  5. [5]

    Data-driven MPC for quadrotors,

    G. Torrente, E. Kaufmann, P. F ¨ohn, and D. Scaramuzza, “Data-driven MPC for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3769–3776, 2021

  6. [6]

    Real-time neural MPC: Deep learning model predictive control for quadrotors and agile robotic platforms,

    T. Salzmann, E. Kaufmann, J. Arrizabalaga, M. Pavone, D. Scaramuzza, and M. Ryll, “Real-time neural MPC: Deep learning model predictive control for quadrotors and agile robotic platforms,”IEEE Robotics and Automation Letters, vol. 8, no. 4, pp. 2397–2404, 2023

  7. [7]

    Control- oriented meta-learning,

    S. M. Richards, N. Azizan, J.-J. Slotine, and M. Pavone, “Control- oriented meta-learning,”The International Journal of Robotics Research, vol. 42, no. 10, pp. 777–797, 2023

  8. [8]

    Model predictive contouring control for time-optimal quadrotor flight,

    A. Romero, S. Sun, P. Foehn, and D. Scaramuzza, “Model predictive contouring control for time-optimal quadrotor flight,”IEEE Transactions on Robotics, vol. 38, no. 6, pp. 3340–3356, 2022

  9. [9]

    Autotune: Controller tuning for high-speed flight,

    A. Loquercio, A. Saviolo, and D. Scaramuzza, “Autotune: Controller tuning for high-speed flight,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4432–4439, 2022

  10. [10]

    DiffTune- MPC: Closed-loop learning for model predictive control,

    R. Tao, S. Cheng, X. Wang, S. Wang, and N. Hovakimyan, “DiffTune- MPC: Closed-loop learning for model predictive control,”IEEE Robotics and Automation Letters, vol. 9, no. 8, pp. 7294–7301, 2024

  11. [11]

    NeuroBEM: Hybrid aerodynamic quadrotor model,

    L. Bauersfeld, E. Kaufmann, P. Foehn, S. Sun, and D. Scaramuzza, “NeuroBEM: Hybrid aerodynamic quadrotor model,” inRobotics: Sci- ence and Systems XVII (RSS), 2021

  12. [12]

    Modern koopman theory for dynamical systems

    S. L. Brunton, M. Budi ˇsi´c, E. Kaiser, and J. N. Kutz, “Modern koopman theory for dynamical systems,”arXiv preprint arXiv:2102.12086, 2021

  13. [13]

    Data-efficient model learning for control with jacobian-regularized dynamic-mode decomposition,

    B. E. Jackson, J. H. Lee, K. Tracy, and Z. Manchester, “Data-efficient model learning for control with jacobian-regularized dynamic-mode decomposition,” in6th Conference on Robot Learning (CoRL), vol. 205, 2023, pp. 2273–2283

  14. [14]

    Learning control affine neural NARX models for internal model control design,

    J. Xie, F. Bonassi, and R. Scattolini, “Learning control affine neural NARX models for internal model control design,”IEEE Transactions on Automation Science and Engineering, pp. 1–13, 2024

  15. [15]

    Tractable data-driven model predictive control using one-step neural networks predictors,

    D. Menegatti, A. Giuseppi, and A. Pietrabissa, “Tractable data-driven model predictive control using one-step neural networks predictors,” IEEE Transactions on Automation Science and Engineering, pp. 1–12, 2024

  16. [16]

    Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking,

    A. Saviolo, G. Li, and G. Loianno, “Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 256–10 263, 2022

  17. [17]

    Neural network model predictive motion control applied to automated driving with unknown friction,

    N. A. Spielberg, M. Brown, and J. C. Gerdes, “Neural network model predictive motion control applied to automated driving with unknown friction,”IEEE Transactions on Control Systems Technology, vol. 30, no. 5, pp. 1934–1945, 2022

  18. [18]

    Millimeter-level pick and peg-in-hole task achieved by aerial manipu- lator,

    M. Wang, Z. Chen, K. Guo, X. Yu, Y . Zhang, L. Guo, and W. Wang, “Millimeter-level pick and peg-in-hole task achieved by aerial manipu- lator,”IEEE Transactions on Robotics, vol. 40, pp. 1242–1260, 2024

  19. [19]

    Safe stabilization with model uncertainties: A universal formula with gaussian process learning,

    M. Li and Z. Sun, “Safe stabilization with model uncertainties: A universal formula with gaussian process learning,” in2024 IEEE 18th International Conference on Control and Automation (ICCA), 2024, pp. 180–185

  20. [20]

    Learning quadrotor dynamics using neural network for flight control,

    S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in2016 IEEE 55th Conference on Decision and Control (CDC), 2016, pp. 4653–4660

  21. [21]

    Learning long-horizon predictions for quadrotor dynamics,

    P. P. Rao, A. Saviolo, T. C. Ferrari, and G. Loianno, “Learning long-horizon predictions for quadrotor dynamics,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 12 758–12 765

  22. [22]

    Neural ordinary differential equations,

    R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in32nd International Conference on Neural Information Processing Systems (NeuraIPS). Curran Associates Inc., 2018, p. 6572–6583

  23. [23]

    KNODE-MPC: A knowledge-based data-driven predictive control framework for aerial robots,

    K. Y . Chee, T. Z. Jiahao, and M. A. Hsieh, “KNODE-MPC: A knowledge-based data-driven predictive control framework for aerial robots,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2819– 2826, 2022

  24. [24]

    Data-driven learning for robot control with unknown jacobian,

    S. Lyu and C. C. Cheah, “Data-driven learning for robot control with unknown jacobian,”Automatica, vol. 120, p. 109120, 2020

  25. [25]

    Neural-fly enables rapid learning for agile flight in strong winds,

    M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y . Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,”Science Robotics, vol. 7, no. 66, p. eabm6597, 2022

  26. [26]

    Precise end- effector control for an aerial manipulator under composite disturbances: Theory and experiments,

    M. Wang, S. Lyu, Q. Liu, Z. Yang, K. Guo, and X. Yu, “Precise end- effector control for an aerial manipulator under composite disturbances: Theory and experiments,”IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 4006–4021, 2025

  27. [27]

    Time-optimal online replanning for agile quadrotor flight,

    A. Romero, R. Penicka, and D. Scaramuzza, “Time-optimal online replanning for agile quadrotor flight,”IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7730–7737, 2022

  28. [28]

    Time-optimal planning for quadrotor waypoint flight,

    P. Foehn, A. Romero, and D. Scaramuzza, “Time-optimal planning for quadrotor waypoint flight,”Science Robotics, vol. 6, no. 56, p. eabh1221, 2021

  29. [29]

    Efficient and robust time-optimal trajectory planning and control for agile quadrotor flight,

    Z. Zhou, G. Wang, J. Sun, J. Wang, and J. Chen, “Efficient and robust time-optimal trajectory planning and control for agile quadrotor flight,” IEEE Robotics and Automation Letters, vol. 8, no. 12, pp. 7913–7920, 2023

  30. [30]

    Minimum snap trajectory generation and control for quadrotors,

    D. Mellinger and V . Kumar, “Minimum snap trajectory generation and control for quadrotors,” in2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 2520–2525

  31. [31]

    Optimal time allocation for quadrotor trajectory generation,

    F. Gao, W. Wu, J. Pan, B. Zhou, and S. Shen, “Optimal time allocation for quadrotor trajectory generation,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4715– 4722

  32. [32]

    Fast UA V trajectory optimization using bilevel optimization with analytical gradients,

    W. Sun, G. Tang, and K. Hauser, “Fast UA V trajectory optimization using bilevel optimization with analytical gradients,”arXiv preprint arXiv:1811.10753, 2021

  33. [33]

    TRACE: Trajectory refinement with control error enables safe and accurate maneuvers,

    Z. Yang, J. Jia, Y . Liu, K. Guo, X. Yu, and L. Guo, “TRACE: Trajectory refinement with control error enables safe and accurate maneuvers,” in 2024 IEEE 18th International Conference on Control and Automation (ICCA), 2024, pp. 154–161

  34. [34]

    Accurate high-maneuvering trajectory tracking for quadrotors: A drag utilization method,

    J. Jia, K. Guo, X. Yu, W. Zhao, and L. Guo, “Accurate high-maneuvering trajectory tracking for quadrotors: A drag utilization method,”IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6966–6973, 2022

  35. [35]

    Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,

    M. Faessler, A. Franchi, and D. Scaramuzza, “Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 620–626, 2018

  36. [36]

    Differential flatness transformations for aggressive quadrotor flight,

    B. Morrell, M. Rigter, G. Merewether, R. Reid, R. Thakker, T. Tzanetos, V . Rajur, and G. Chamitoff, “Differential flatness transformations for aggressive quadrotor flight,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5204–5210

  37. [37]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  38. [38]

    CasADi - A software framework for nonlinear optimization and optimal control,

    J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, “CasADi - A software framework for nonlinear optimization and optimal control,”Mathematical Programming Computation, vol. 11, no. 1, pp. 1–36, 2019

  39. [39]

    Anti-disturbance control theory for systems with multiple disturbances: A survey,

    L. Guo and S. Cao, “Anti-disturbance control theory for systems with multiple disturbances: A survey,”ISA Transactions, vol. 53, no. 4, pp. 846–849, 2014, disturbance Estimation and Mitigation

  40. [40]

    Hovakimyan and C

    N. Hovakimyan and C. Cao,L1 Adaptive Control Theory. Society for Industrial and Applied Mathematics, 2010

  41. [41]

    DATT: Deep adaptive trajectory tracking for quadrotor control,

    K. Huang, R. Rana, A. Spitzer, G. Shi, and B. Boots, “DATT: Deep adaptive trajectory tracking for quadrotor control,” in7th Annual Con- ference on Robot Learning, 2023