AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust
Pith reviewed 2026-06-30 13:53 UTC · model grok-4.3
The pith
Reinforcement learning policies enable compact quadrotor inversions with bidirectional thrust that outperform optimization baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that two reinforcement learning policies, trained separately for nominal-to-inverted and inverted-to-nominal transitions, modulate a constant reference trajectory to achieve the lowest position root mean square error and shortest settling time among evaluated baselines in JAX simulation, with a 32 percent reduction in position RMSE and 57 percent reduction in settling time relative to the strongest optimization baseline, while hardware tests confirm successful inversions across yaw configurations with position RMSE below 0.35 meters and compatibility with circular flight in both regimes.
What carries the argument
The central mechanism is the pair of reinforcement learning policies for the two transition directions that modulate a constant reference trajectory to handle actuator saturation and motor reversal.
If this is right
- The policies integrate directly with traditional trajectory generation and tracking controllers across both flight regimes.
- Hardware demonstrations succeed for multiple yaw configurations without additional tuning.
- The method supports downstream tasks such as circular flight after the inversion is complete.
- An open-source implementation is provided to allow reproduction and extension.
Where Pith is reading between the lines
- The same policy structure could be applied to other aggressive maneuvers that require bidirectional thrust, such as perching, if similar reference modulation is used.
- Improved simulation fidelity for motor dynamics could further reduce the gap between simulation and hardware performance.
- The approach suggests that learning-based modulation of reference trajectories may generalize to other vehicles that switch between multiple equilibrium conditions.
Load-bearing premise
The JAX simulation captures real-world effects such as motor reversal delays and actuator saturation accurately enough for the trained policies to transfer to hardware without further tuning.
What would settle it
A hardware experiment in which the quadrotor either fails to complete the inversion or records position RMSE above 0.35 meters would falsify the transfer claim.
Figures
read the original abstract
Bidirectional thrust grants quadrotors a second equilibrium condition and increased control authority, expanding the envelope of possible aggressive maneuvers and enabling inverted flight, perching, and sensing. Prior geometric control approaches extend differential flatness through Hopf fibration-based attitude representations to support bidirectional thrust, but struggle with actuator saturation and motor reversal delay during inversions, requiring heuristic thrust posture scheduling and waypoint tuning. We propose a learning-based framework that modulates a constant reference trajectory to perform compact, position-constrained quadrotor inversions while remaining compatible with traditional trajectory generation and tracking across flight regimes. Separate policies are trained via reinforcement learning for nominal-to-inverted and inverted-to-nominal transitions. In JAX-based simulation, the proposed method achieves the lowest position deviation and settling time across all evaluated baselines, reducing position root mean square error (RMSE) by 32% and settling time by 57% relative to the strongest optimization-based baseline. Hardware experiments demonstrate successful inversion across multiple yaw configurations with position RMSE below 0.35m, and compatibility with downstream trajectory generation and control through circular flight in both regimes. Additionally, we provide an open-source implementation of the proposed framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AcroRL, a reinforcement learning framework for aggressive quadrotor inversions under bidirectional thrust. Separate policies are trained to modulate a constant reference trajectory for nominal-to-inverted and inverted-to-nominal transitions. In JAX simulation the method reports the lowest position deviation and settling time, with 32% RMSE reduction and 57% settling-time reduction versus the strongest optimization baseline; hardware trials show successful inversions across yaw angles with position RMSE below 0.35 m and continued compatibility with standard trajectory tracking and circular flight.
Significance. If the simulation-to-hardware transfer is reliable, the work provides a practical learning-based route to compact inversions that avoids heuristic scheduling required by prior geometric controllers. The open-source implementation is a clear strength that supports reproducibility and follow-on research in aggressive quadrotor flight.
major comments (3)
- [Simulation Experiments] Simulation Experiments: the headline 32% RMSE and 57% settling-time improvements are stated without reporting the number of random seeds, training trials, variance across runs, or statistical tests, making it impossible to assess whether the gains are robust or merely point estimates.
- [Hardware Experiments] Hardware Experiments: quantitative baseline comparisons are supplied only in simulation; the hardware section reports only qualitative success (RMSE < 0.35 m) with no side-by-side metrics against the optimization-based controllers, so the practical advantage on the physical platform remains unquantified.
- [Modeling and Simulation] Actuator Modeling: the JAX simulation incorporates motor-reversal delays and saturation, yet no step-response, reversal-timing, or frequency-response plots comparing the model to measured hardware data are provided; without this validation the sim-to-real transfer of the learned modulation policies rests on an unverified modeling assumption.
minor comments (2)
- [Method] The reward-function weights and policy-network architecture are listed as free parameters but their specific values and sensitivity analysis are not tabulated, which would aid readers attempting to reproduce the training.
- Figure captions for the hardware trajectories could explicitly state the yaw angles tested and whether the plotted reference is the constant or the modulated trajectory.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, indicating where revisions will be made to improve clarity and rigor.
read point-by-point responses
-
Referee: [Simulation Experiments] Simulation Experiments: the headline 32% RMSE and 57% settling-time improvements are stated without reporting the number of random seeds, training trials, variance across runs, or statistical tests, making it impossible to assess whether the gains are robust or merely point estimates.
Authors: We agree that statistical reporting is required to substantiate the claims. In the revised manuscript we will rerun all simulation experiments across multiple random seeds, report mean and standard deviation for position RMSE and settling time, and include statistical significance tests (e.g., paired t-tests) comparing against the optimization baseline. revision: yes
-
Referee: [Hardware Experiments] Hardware Experiments: quantitative baseline comparisons are supplied only in simulation; the hardware section reports only qualitative success (RMSE < 0.35 m) with no side-by-side metrics against the optimization-based controllers, so the practical advantage on the physical platform remains unquantified.
Authors: Direct quantitative hardware comparisons with the optimization baseline were not performed because safely executing those controllers on the physical platform requires extensive additional tuning and safety protocols beyond the scope of the current validation. The hardware results demonstrate successful sim-to-real transfer and compatibility with standard tracking. In the revision we will expand the discussion to explicitly state these experimental constraints and the practical advantages of the learned policies. revision: partial
-
Referee: [Modeling and Simulation] Actuator Modeling: the JAX simulation incorporates motor-reversal delays and saturation, yet no step-response, reversal-timing, or frequency-response plots comparing the model to measured hardware data are provided; without this validation the sim-to-real transfer of the learned modulation policies rests on an unverified modeling assumption.
Authors: The actuator model parameters were identified from hardware data. We will add step-response and motor-reversal timing plots in the revised manuscript that directly compare the JAX simulation outputs against the corresponding hardware measurements to validate the modeling assumptions. revision: yes
Circularity Check
No circularity: empirical RL results independent of self-referential derivations
full rationale
The paper proposes an RL-based policy modulation framework for bidirectional-thrust quadrotor inversions, trained separately for nominal-to-inverted and inverted-to-nominal transitions. All performance claims (32% RMSE reduction, 57% settling-time reduction in JAX sim; hardware RMSE <0.35 m) are obtained from direct simulation rollouts and physical experiments rather than any first-principles derivation, fitted-parameter prediction, or self-citation chain. No equations or uniqueness theorems are invoked that reduce to the method's own inputs; the work is self-contained against external simulation and hardware benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Policy network parameters
- Reward function weights
axioms (2)
- domain assumption The bidirectional thrust quadrotor dynamics are accurately modeled in simulation
- domain assumption Separate policies for each transition direction are sufficient without a unified policy
Reference graph
Works this paper leans on
-
[1]
P. Yu, G. Chamitoff, and K. Wong. Perching upside down with bi-directional thrust quadrotor. In2020 International Conference on Unmanned Aircraft Systems (ICUAS), pages 1697–1703,
-
[2]
doi:10.1109/ICUAS48674.2020.9213946
-
[3]
J. Bass, I. Tunney, and A. L. Desbiens. Adaptative friction shock absorbers and reverse thrust for fast multirotor landing on inclined surfaces.IEEE Robotics and Automation Letters, 7(3): 6701–6708, 2022. doi:10.1109/LRA.2022.3176102
-
[4]
Watterson, A
M. Watterson, A. Zahra, and V . Kumar. Geometric control and trajectory optimization for bidirectional thrust quadrotors. In J. Xiao, T. Kr ¨oger, and O. Khatib, editors,Proceedings of the 2018 International Symposium on Experimental Robotics, pages 165–176, Cham, 2020. Springer International Publishing. ISBN 978-3-030-33950-0
2018
-
[5]
W. Jothiraj, I. Sharf, and M. Nahon. Control allocation of bidirectional thrust quadrotor sub- ject to actuator constraints. In2020 International Conference on Unmanned Aircraft Systems (ICUAS), pages 932–938, 2020. doi:10.1109/ICUAS48674.2020.9214036
-
[6]
M. Maier. Bidirectional thrust for multirotor mavs with fixed-pitch propellers. In2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8,
-
[7]
doi:10.1109/IROS.2018.8593836
-
[8]
J. Bass and A. L. Desbiens. Improving multirotor landing performance on inclined surfaces using reverse thrust.IEEE Robotics and Automation Letters, 5(4):5850–5857, 2020. doi: 10.1109/LRA.2020.3010208. 9
-
[9]
P. Yu and K. Wong. An implementation framework for vision-based bat-like inverted perch- ing with bi-directional thrust quadrotor.International Journal of Micro Air Vehicles, 14: 1–12, 2022. doi:10.1177/17568293211073672. URLhttps://doi.org/10.1177/ 17568293211073672
-
[10]
M. Gong, Z. Shao, B. Li, J. Wang, and Y . Wang. Aggressive perching trajectory planning and control for quadrotor.IFAC-PapersOnLine, 59(20):1243–1248, 2025
2025
-
[11]
B. Habas, A. Brown, D. Lee, M. Goldman, and B. Cheng. From ceilings to walls: Uni- versal dynamic perching of quadrotors on surfaces with variable orientations. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 288–294, 2025. doi: 10.1109/ICRA55743.2025.11128577
-
[12]
B. Habas and B. Cheng. From flies to robots: Inverted landing in small quadcopters with dynamic perching.IEEE Transactions on Robotics, 41:1773–1790, 2025. doi:10.1109/TRO. 2025.3543263
work page doi:10.1109/tro 2025
-
[13]
URL https://doi.org/10.1109/ICRA48891.2023.10160591
B. Habas, J. W. Langelaan, and B. Cheng. Inverted landing in a small aerial robot via deep reinforcement learning for triggering and control of rotational maneuvers. In2023 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 3368–3375, 2023. doi: 10.1109/ICRA48891.2023.10160376
-
[14]
J. Mao, G. Li, S. Nogar, C. Kroninger, and G. Loianno. Aggressive visual perching with quadrotors on inclined surfaces. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5242–5248, 2021. doi:10.1109/IROS51168.2021.9636690
-
[15]
Y . Zou, H. Li, Y . Ren, W. Xu, Y . Li, Y . Cai, S. Zhou, and F. Zhang. Perch a quadrotor on planes by the ceiling effect. In2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), pages 1–7, 2023. doi:10.1109/CASE56687.2023.10260542
- [16]
-
[17]
Y . Li, D. Li, Y . Zhang, L. Huang, S. Wang, and B. Cai. Tail-mav: Design and control of a perching micro aerial vehicle inspired by the tail-suspended behavior of primates. In2025 7th International Symposium on Robotics and Intelligent Manufacturing Technology (ISRIMT), pages 372–376, 2025. doi:10.1109/ISRIMT67769.2025.11413171
-
[18]
J. L. Paneque, J. R. M.-d. Dios, A. Ollero, D. Hanover, S. Sun, A. Romero, and D. Scaramuzza. Perception-aware perching on powerlines with multirotors.IEEE Robotics and Automation Letters, 7(2):3077–3084, 2022. doi:10.1109/LRA.2022.3145514
-
[19]
Battiston, I
A. Battiston, I. Sharf, and M. Nahon. Attitude estimation for collision recovery of a quadcopter unmanned aerial vehicle.The International Journal of Robotics Research, 38(10-11):1286– 1306, 2019
2019
-
[20]
M. W. Mueller. Multicopter attitude control for recovery from large disturbances, 2018. URL https://arxiv.org/abs/1802.09143
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
M. Faessler, F. Fontana, C. Forster, and D. Scaramuzza. Automatic re-initialization and fail- ure recovery for aggressive flight with a monocular vision-based quadrotor. In2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1722–1729, 2015. doi: 10.1109/ICRA.2015.7139420
-
[22]
F. Liao, D. Neo, K. Peng, D. Jia, A. Yash, and W. Liu. Reversible thrust-based fault tolerant control for quadrotor uavs against motor failure. In2025 European Control Conference (ECC), pages 1531–1536, 2025. doi:10.23919/ECC65951.2025.11187187. 10
-
[23]
Z. Chen, S. Mo, B. Zhang, J. Li, and H. Cheng. Robust control for bidirectional thrust quadro- tors under instantaneously drastic disturbances. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6186–6192, 2024. doi:10.1109/ICRA57147.2024. 10611241
-
[24]
Y . Zhao, M. Lyu, and H. Huang. A novel anti-disturbance control framework for bidirectional quadrotors. In2025 IEEE 19th International Conference on Control & Automation (ICCA), pages 310–315, 2025. doi:10.1109/ICCA65672.2025.11129814
-
[25]
Y . Zhao, M. Lyu, C. Li, and H. Huang. Bidirectional thrust control for quadrotor safety.IEEE Robotics and Automation Letters, 11(3):2650–2657, 2026. doi:10.1109/LRA.2026.3653327
-
[26]
Wehbeh and I
J. Wehbeh and I. Sharf. An mpc formulation onso(3)for a quadrotor with bidirectional thrust and nonlinear thrust constraints.IEEE Robotics and Automation Letters, 7(2):4945–4952,
-
[27]
doi:10.1109/LRA.2022.3154021
-
[28]
J. Wehbeh and I. Sharf. Geometric mpc techniques for reduced attitude control on quadrotors with bidirectional thrust. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12330–12335, 2022. doi:10.1109/IROS47612.2022.9982250
-
[29]
J. Wehbeh and I. Sharf. Nonlinear scenario-based model predictive control for quadrotors with bidirectional thrust.International Journal of Robust and Nonlinear Control, 34(18):12450– 12475, 2024. doi:https://doi.org/10.1002/rnc.7627. URLhttps://onlinelibrary. wiley.com/doi/abs/10.1002/rnc.7627
-
[30]
L. Xu, Z. Cai, Y . Wang, and Z. Shen. The control method of a quadrotor driven by bidi- rectional electronic speed controllers.Scientific Reports, 14(1):19532, 2024. doi:10.1038/ s41598-024-70681-3. URLhttps://doi.org/10.1038/s41598-024-70681-3
-
[31]
L. Xu, Z. Cai, Y . Wang, R. Cai, and Y . Liu. The motion planning, learning and control of a bidirectional thrust quadrotor with special tasks.Journal of Electrical Engineering & Technology, 21(3):3043–3059, 2026. doi:10.1007/s42835-026-02591-5. URLhttps: //doi.org/10.1007/s42835-026-02591-5
-
[32]
W. Jothiraj, C. Miles, E. Bulka, I. Sharf, and M. Nahon. Enabling bidirectional thrust for aggressive and inverted quadrotor flight. In2019 International Conference on Unmanned Aircraft Systems (ICUAS), pages 534–541, 2019. doi:10.1109/ICUAS.2019.8798234
-
[33]
K. Mao, J. Welde, M. A. Hsieh, and V . Kumar. Trajectory planning for the bidirectional quadrotor as a differentially flat hybrid system. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1242–1248, 2023. doi:10.1109/ICRA48891.2023. 10160320
-
[34]
K. Mao, I. Spasojevic, M. A. Hsieh, and V . Kumar. Toppquad: Dynamically-feasible time- optimal path parametrization for quadrotors. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13136–13143. IEEE, 2024
2024
-
[35]
S. Ji, Y . Wang, and X. He. Flipping an upright-inverted bimodal bicopter uav: Attitude control and optimization.IEEE/ASME Transactions on Mechatronics, 30(6):5063–5073, 2025. doi: 10.1109/TMECH.2025.3549030
- [36]
-
[37]
P. J. Huber. Robust Estimation of a Location Parameter.The Annals of Mathematical Statis- tics, 35(1):73 – 101, 1964. doi:10.1214/aoms/1177703732. URLhttps://doi.org/10. 1214/aoms/1177703732. 11
-
[38]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimiza- tion algorithms.CoRR, abs/1707.06347, 2017. URLhttp://arxiv.org/abs/1707. 06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [39]
-
[40]
Watterson and V
M. Watterson and V . Kumar. Control of quadrotors using the hopf fibration on so(3). In N. M. Amato, G. Hager, S. Thomas, and M. Torres-Torriti, editors,Robotics Research, pages 199–215, Cham, 2020. Springer International Publishing. ISBN 978-3-030-28619-4
2020
-
[41]
A. Spitzer and N. Michael. Rotational error metrics for quadrotor control.CoRR, abs/2011.11909, 2020. URLhttps://arxiv.org/abs/2011.11909
-
[42]
Nurlanov.so(3)transformations and jacobian of extended log map
Z. Nurlanov.so(3)transformations and jacobian of extended log map. Technical report, Legged Robotics (Kindr), 2024. URLhttps://github.com/nurlanov-zh/so3_ log_map
2024
-
[43]
Bertsekas.Nonlinear Programming
D. Bertsekas.Nonlinear Programming. Athena scientific optimization and computation series. Athena Scientific, 2016. ISBN 9781886529052. URLhttps://books.google.com/ books?id=rC1EEAAAQBAJ
2016
-
[44]
Bradbury, R
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y . Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: compos- able transformations of Python+NumPy programs, 2018. URLhttp://github.com/ jax-ml/jax
2018
-
[45]
G. Cano Lopes, M. Ferreira, A. da Silva Sim˜oes, and E. Luna Colombini. Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. In2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), pages 503–508, 2018. doi:10.1109/LARS/SBR/WRE.2018. 00094
- [46]
- [47]
-
[48]
Ganesh, S
V . Ganesh, S. Khidkikar, D. Velarde, J. Cox, C. Barngrover, and N. Michael. EdgeOS: A high- performance middleware framework for autonomous robotics.https://shield.ai/ hivemind-edgeos-a-game-changer-for-autonomous-robotics/, 2025
2025
-
[49]
O. R. developers. Onnx runtime.https://onnxruntime.ai/, 2021. Version: 1.20.1
2021
-
[50]
Mellinger.Trajectory generation and control for quadrotors
D. Mellinger.Trajectory generation and control for quadrotors. PhD thesis,
-
[51]
URLhttps://www.proquest.com/dissertations-theses/ trajectory-generation-control-quadrotors/docview/1018692309/ se-2. Copyright - Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works; Last updated - 2023-03-03. 12 A Topological Foundations of the Hopf Fibration-Based Control Algorithm Figure 6: Visualization...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.