pith. sign in

arxiv: 2603.19966 · v2 · submitted 2026-03-20 · 💻 cs.RO

GustPilot: A Hierarchical DRL-INDI Framework for Wind-Resilient Quadrotor Navigation

Pith reviewed 2026-05-15 08:31 UTC · model grok-4.3

classification 💻 cs.RO
keywords wind-resilient navigationquadrotor controldeep reinforcement learningincremental nonlinear dynamic inversiondomain randomizationgate traversaldisturbance rejectionautonomous flight
0
0 comments X

The pith

A DRL velocity planner paired with an INDI acceleration-feedback controller lets lightweight quadrotors traverse gates reliably under strong wind gusts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a deep reinforcement learning policy trained to generate inertial-frame velocity references in a minimal simulated windy environment can guide a small quadrotor through gate traversal when its outputs are tracked by a geometric incremental nonlinear dynamic inversion controller. The INDI layer uses onboard sensor measurements of linear and angular accelerations to reject wind disturbances at execution time. A sympathetic reader would care because lightweight drones are easily destabilized by rapidly varying airflow that breaks both planning and tracking, and the demonstrated transfer from simple simulation to complex real scenes offers a concrete route to more robust outdoor autonomous flight without requiring retraining for each new environment.

Core claim

GustPilot uses a DRL policy to produce velocity commands for gate traversal while a geometric INDI controller tracks them by providing incremental feedback on specific linear acceleration and angular acceleration rate, achieving rapid residual disturbance rejection from wind. Trained solely in a single-gate single-fan simulation via domain randomization, the policy generalizes without retraining to real flights involving up to six gates and four dynamic disturbance sources on a 50 g platform, delivering 94.7 percent overall success rate, up to 50 percent lower tracking RMSE, and sustained speeds of 1.34 m/s under winds up to 3.5 m/s across 80 experiments.

What carries the argument

The hierarchical DRL-INDI stack, where the DRL policy supplies high-level inertial velocity references and the INDI controller supplies low-level incremental acceleration feedback for disturbance rejection.

If this is right

  • The policy trained in a minimal single-gate single-fan setup generalizes to environments with up to six gates and four dynamic disturbance sources without retraining.
  • The combined system reaches 94.7 percent overall success rate versus 55 percent for a DRL-PID baseline across 80 real flights.
  • Tracking RMSE drops by up to 50 percent while speeds up to 1.34 m/s are sustained under 3.5 m/s winds on a 50 g platform.
  • Wind-aware planning via fan-jet randomization plus execution-time INDI rejection forms a practical navigation stack for wind-resilient flight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of learned planning and structured acceleration feedback could be applied to other small UAV platforms facing gusts or turbulence.
  • Reducing the simulation-to-reality gap through targeted randomization may shorten the development cycle for learned controllers in physical systems.
  • Adding predictive wind estimation to the DRL layer could further raise speeds and success rates in highly dynamic outdoor settings.

Load-bearing premise

That domain randomization with fan jets in a minimal single-gate simulation produces a policy that transfers without retraining to real-world scenes with up to six gates and four dynamic disturbance sources.

What would settle it

A real-flight experiment in which the policy achieves an overall success rate below 70 percent when tested with four simultaneous moving wind sources and multiple gates would indicate that the claimed generalization does not hold.

Figures

Figures reproduced from arXiv: 2603.19966 by Amir Atef Habel, Clement Fortin, Dzmitry Tsetserukou, Fawad Mehboob, Roohan Ahmed Khan.

Figure 1
Figure 1. Figure 1: GustPilot navigating in a fully windy environment under both [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall of our Hierarchical Guidance Control System Architecture of GustPilot [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training metrics for policies learned with INDI and PID low-level [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 2-D trajectory comparisons for all four scenarios (DRL-INDI vs. DRL-PID). [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Wind disturbances remain a key barrier to reliable autonomous navigation for lightweight quadrotors, where the rapidly varying airflow can destabilize both planning and tracking. This paper introduces GustPilot, a hierarchical wind-resilient navigation stack in which a deep reinforcement learning (DRL) policy generates inertial-frame velocity reference for gate traversal. At the same time, a geometric Incremental Nonlinear Dynamic Inversion (INDI) controller provides low-level tracking with fast residual disturbance rejection. The INDI layer achieves this by providing incremental feedback on both specific linear acceleration and angular acceleration rate, using onboard sensor measurements to reject wind disturbances rapidly. Robustness is obtained through a two-level strategy, wind-aware planning learned via fan-jet domain randomization during training, and rapid execution-time disturbance rejection by the INDI tracking controller. We evaluate GustPilot in real flights on a 50g quad-copter platform against a DRL-PID baseline across four scenarios ranging from no-wind to fully dynamic conditions with a moving gate and a moving disturbance source. Despite being trained only in a minimal single-gate and single-fan setup, the policy generalizes to significantly more complex environments (up to six gates and four fans) without retraining. Across 80 experiments, DRL-INDI achieves a 94.7% versus 55.0% for DRL-PID as average Overall Success Rate (OSR), reduces tracking RMSE up to 50%, and sustains speeds up to 1.34 m/s under wind disturbances up to 3.5 m/s. These results demonstrate that combining DRL-based velocity planning with structured INDI disturbance rejection provides a practical and generalizable approach to wind-resilient autonomous flight navigation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces GustPilot, a hierarchical framework for wind-resilient quadrotor navigation that combines a DRL policy for generating inertial-frame velocity references during gate traversal with a geometric INDI controller for low-level tracking and rapid residual disturbance rejection via incremental feedback on linear and angular accelerations. Robustness is achieved through wind-aware DRL planning trained with fan-jet domain randomization in a minimal single-gate simulation and fast execution-time rejection by the INDI layer. The approach is evaluated in 80 real flights on a 50g platform across four scenarios (no-wind to dynamic multi-source wind with moving gates), claiming 94.7% average OSR (vs. 55% for DRL-PID baseline), up to 50% RMSE reduction, and sustained speeds of 1.34 m/s under 3.5 m/s winds, with zero-shot generalization to up to six gates and four fans.

Significance. If the sim-to-real transfer and performance gains are substantiated, the work offers a practical demonstration of combining learning-based velocity planning with structured model-based disturbance rejection for lightweight UAVs in wind. The real-flight evaluation on a 50g platform and the reported quantitative improvements over a DRL-PID baseline constitute the primary strengths; the hierarchical separation of concerns is a clear contribution if the DRL planner's role beyond the INDI layer can be isolated.

major comments (3)
  1. [Abstract] Abstract: The reported 94.7% OSR and up to 50% RMSE reduction are presented without error bars, standard deviations, or statistical significance tests (e.g., paired t-tests or Wilcoxon tests) across the 80 flights; this omission is load-bearing because the central claim of superiority over DRL-PID rests on these aggregate numbers, and post-hoc scenario selection cannot be ruled out without the full protocol and per-scenario breakdowns.
  2. [Experiments] Experiments (assumed §V or equivalent): The domain-randomization procedure is limited to a single-gate, single-fan simulation, yet the real-world tests include up to six gates and four dynamic fans; no ablation, sensitivity analysis, or disturbance-spectrum comparison is provided to confirm that the randomization captures gate-induced flow interactions or multi-source turbulence spectra, which directly undermines the zero-shot generalization claim that is central to the hierarchical contribution.
  3. [Method] Method (assumed §III or IV): The INDI layer is described as providing incremental feedback on specific linear acceleration and angular acceleration rate, but the manuscript does not quantify the relative contribution of the DRL planner versus the INDI rejection (e.g., via an INDI-only baseline); without this isolation, the reported gains could be driven primarily by the low-level controller rather than the learned velocity planning.
minor comments (2)
  1. [Abstract] The acronym OSR is used in the abstract without an explicit definition on first use; add 'Overall Success Rate (OSR)' at its first appearance.
  2. [Figures] Figure captions (assumed in §V) should include the number of trials per scenario and any filtering criteria applied to the 80 flights to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with honest responses and indicate planned revisions to improve clarity and rigor without misrepresenting the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported 94.7% OSR and up to 50% RMSE reduction are presented without error bars, standard deviations, or statistical significance tests (e.g., paired t-tests or Wilcoxon tests) across the 80 flights; this omission is load-bearing because the central claim of superiority over DRL-PID rests on these aggregate numbers, and post-hoc scenario selection cannot be ruled out without the full protocol and per-scenario breakdowns.

    Authors: We agree that statistical details strengthen the claims. The full manuscript reports per-scenario OSR and RMSE values across the four scenarios in the experiments section, but the abstract aggregates them without deviations. We will revise the abstract to include standard deviations for the reported metrics and add a sentence describing the fixed experimental protocol (predefined scenario sequence with no post-hoc selection). If raw per-flight data permits, we will also include a note on statistical comparison in the revision. revision: yes

  2. Referee: [Experiments] Experiments (assumed §V or equivalent): The domain-randomization procedure is limited to a single-gate, single-fan simulation, yet the real-world tests include up to six gates and four dynamic fans; no ablation, sensitivity analysis, or disturbance-spectrum comparison is provided to confirm that the randomization captures gate-induced flow interactions or multi-source turbulence spectra, which directly undermines the zero-shot generalization claim that is central to the hierarchical contribution.

    Authors: The minimal single-gate single-fan training was chosen precisely to test zero-shot generalization as a core contribution of the hierarchical design. The 80 real flights provide empirical support by succeeding in far more complex conditions (up to six gates, four fans, moving elements). We will add a sensitivity analysis on randomization parameters and a short discussion of how fan-jet randomization approximates multi-source effects in the revised experiments section, though a full spectral comparison would require new simulation work. revision: partial

  3. Referee: [Method] Method (assumed §III or IV): The INDI layer is described as providing incremental feedback on specific linear acceleration and angular acceleration rate, but the manuscript does not quantify the relative contribution of the DRL planner versus the INDI rejection (e.g., via an INDI-only baseline); without this isolation, the reported gains could be driven primarily by the low-level controller rather than the learned velocity planning.

    Authors: The DRL-PID baseline uses the exact same DRL velocity planner, so performance differences can be attributed to the INDI layer. An INDI-only baseline would require designing a separate non-learning velocity generator, which lies outside the paper's focus on the integrated hierarchical framework. We will revise the method section to more explicitly discuss the complementary roles and quantify the incremental benefit of INDI over PID given the shared planner. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of inputs

full rationale

The paper's core contribution is an empirical demonstration of a DRL-INDI hierarchy on a 50g platform, with success rates and RMSE reductions obtained from 80 real flights against a DRL-PID baseline. No equations, fitted parameters, or derivations are shown that reduce the reported OSR or tracking metrics to simulation inputs by construction. The sim-to-real transfer claim is presented as an experimental outcome rather than a mathematical prediction forced by the training setup. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in any derivation chain. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, axioms, or invented entities; the approach relies on standard DRL training and INDI formulation without additional postulated quantities.

pith-pipeline@v0.9.0 · 5628 in / 1164 out tokens · 45755 ms · 2026-05-15T08:31:29.678841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    Neural-fly enables rapid learning for agile flight in strong winds,

    M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y . Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,”Science Robotics, vol. 7, no. 66, p. eabm6597,

  2. [2]

    Available: https://www.science.org/doi/abs/10.1126/ scirobotics.abm6597

    [Online]. Available: https://www.science.org/doi/abs/10.1126/ scirobotics.abm6597

  3. [3]

    DATT: Deep adaptive trajectory tracking for quadrotor control,

    K. Huang, R. Rana, A. Spitzer, G. Shi, and B. Boots, “DATT: Deep adaptive trajectory tracking for quadrotor control,” in7th Annual Conference on Robot Learning, 2023. [Online]. Available: https://openreview.net/forum?id=XEw-cnNsr6

  4. [4]

    Safe uav control against wind disturbances via demonstration-guided reinforcement learning,

    Y .-H. Huang, E.-J. Liu, B.-C. Wu, and Y .-J. Ning, “Safe uav control against wind disturbances via demonstration-guided reinforcement learning,”Drones, vol. 10, no. 1, 2026. [Online]. Available: https://www.mdpi.com/2504-446X/10/1/2

  5. [5]

    Agilepilot: Drl-based drone agent for real-time motion planning in dynamic environments by leveraging object detection,

    R. A. Khan, V . Serpiva, D. Aschalew, A. Fedoseev, and D. Tset- serukou, “Agilepilot: Drl-based drone agent for real-time motion planning in dynamic environments by leveraging object detection,” in2025 International Conference on Unmanned Aircraft Systems (ICUAS), 2025, pp. 185–192

  6. [6]

    Au- tonomous drone racing with deep reinforcement learning,

    Y . Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Au- tonomous drone racing with deep reinforcement learning,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 1205–1212

  7. [7]

    Beauty and the beast: Optimal methods meet learning for drone racing,

    E. Kaufmann, M. Gehrig, P. Foehn, R. Ranftl, A. Dosovitskiy, V . Koltun, and D. Scaramuzza, “Beauty and the beast: Optimal methods meet learning for drone racing,” 05 2019, pp. 690–696

  8. [8]

    Champion-level drone racing using deep reinforcement learning,

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, Aug 2023. [Online]. Available: https://doi.org/10.1038/s41586-023-06419-4

  9. [9]

    Sim-to-real deep reinforcement learning based obstacle avoidance for uavs under measurement uncer- tainty,

    B. Joshi, D. Kapur, and H. Kandath, “Sim-to-real deep reinforcement learning based obstacle avoidance for uavs under measurement uncer- tainty,” inProc. 2024 10th International Conference on Automation, Robotics and Applications (ICARA), 2024, pp. 278–284

  10. [10]

    Tornadodrone: Bio-inspired drl-based drone landing on 6d platform with wind force disturbances,

    R. Peter, L. Ratnabala, D. Aschu, A. Fedoseev, and D. Tsetserukou, “Tornadodrone: Bio-inspired drl-based drone landing on 6d platform with wind force disturbances,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2024, pp. 516–521

  11. [11]

    Lander.ai: Drl-based autonomous drone landing on moving 3d surface in the presence of aerodynamic disturbances,

    ——, “Lander.ai: Drl-based autonomous drone landing on moving 3d surface in the presence of aerodynamic disturbances,” inProc. 2024 International Conference on Unmanned Aircraft Systems (ICUAS), 2024, pp. 295–300

  12. [12]

    Deep reinforcement learning-based wind disturbance rejection control strategy for uav,

    Q. Ma, Y . Wu, M. U. Shoukat, Y . Yan, J. Wang, L. Yang, F. Yan, and L. Yan, “Deep reinforcement learning-based wind disturbance rejection control strategy for uav,”Drones, vol. 8, no. 11, 2024. [Online]. Available: https://www.mdpi.com/2504-446X/8/11/632

  13. [13]

    Quadcopter neural controller for take-off and landing in windy environments,

    X. Olaz, D. Alaez, M. Prieto, J. Villadangos, and J. J. Astrain, “Quadcopter neural controller for take-off and landing in windy environments,”Expert Systems with Applications, vol. 225, 2023

  14. [14]

    Model-free versus model-based reinforcement learning for fixed-wing uav attitude control under varying wind conditions,

    D. Olivares, P. Fournier, P. Vasishta, and J. Marzat, “Model-free versus model-based reinforcement learning for fixed-wing uav attitude control under varying wind conditions,” inProceedings of the 21st International Conference on Informatics in Control, Automation and Robotics - V olume 1: ICINCO, INSTICC. SciTePress, 2024, pp. 79– 91

  15. [15]

    Zhang,et al., A Learning-Based Quadcopter Controller With Extreme Adaptation.IEEE Transactions on Robotics41, 3948–3964 (2025), doi:10.1109/TRO.2025.3577037

    D. Zhang, A. Loquercio, J. Tang, T.-H. Wang, J. Malik, and M. W. Mueller, “A learning-based quadcopter controller with extreme adaptation,”IEEE Transactions on Robotics, p. 1–17, 2025. [Online]. Available: http://dx.doi.org/10.1109/TRO.2025.3577037

  16. [16]

    Constrained reinforcement learning using distributional representation for trustworthy quadrotor uav tracking control,

    Y . Wang and D. Boyle, “Constrained reinforcement learning using distributional representation for trustworthy quadrotor uav tracking control,”IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 5877–5894, 2025

  17. [17]

    End-to-end reinforcement learning for time-optimal quadcopter flight,

    R. Ferede, C. De Wagter, D. Izzo, and G. C. de Croon, “End-to-end reinforcement learning for time-optimal quadcopter flight,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 6172–6177

  18. [18]

    Global end-effector pose control of an underactuated aerial manipulator via reinforcement learning,

    S. Deshmukh, J. Alonso-Mora, and S. Sun, “Global end-effector pose control of an underactuated aerial manipulator via reinforcement learning,” 2025. [Online]. Available: https://arxiv.org/abs/2512.21085

  19. [19]

    On the theory of the brownian motion,

    G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,”Physical Review, vol. 36, no. 5, pp. 823–841, 1930

  20. [20]

    AC 25.341-1: Dynamic Gust Loads,

    Federal Aviation Administration, “AC 25.341-1: Dynamic Gust Loads,” U.S. Department of Transportation, Federal Aviation Administration, Tech. Rep., 12 2014. [Online]. Available: https://www. faa.gov/documentLibrary/media/Advisory Circular/AC 25 341-1.pdf

  21. [21]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/ 1707.06347

  22. [22]

    Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,

    W. Giernacki, M. Skwierczy ´nski, W. Witwicki, P. Wro ´nski, and P. Kozierski, “Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,” in2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR), 2017, pp. 37–42