pith. sign in

arxiv: 2501.14377 · v2 · submitted 2025-01-24 · 💻 cs.RO

Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3

classification 💻 cs.RO
keywords drone racingmodel-based reinforcement learningvision-based controlvisuomotor policiesautonomous flightsim-to-real transferpixel observations
0
0 comments X

The pith

Model-based reinforcement learning trains drone policies that fly race tracks from camera pixels alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that DreamerV3 learns policies mapping raw camera images directly to control commands for flying a quadrotor through a sequence of gates. This works without the simplified observations or large amounts of imitation learning data that earlier pixel-based methods needed. Model-free algorithms like PPO and SAC prove too sample-inefficient in the same setting. A behavior of actively pointing the camera at textured gates appears even though no reward encourages it. The resulting policies transfer to real hardware at speeds up to 9 m/s when tested in a hardware-in-the-loop setup that supplies rendered images.

Core claim

DreamerV3 trains visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods, this approach acquires drone racing skills from pixels. A perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without handcrafted reward terms. Experiments in simulation and real-world flight with a hardware-in-the-loop setup demonstrate deployment on real quadrotors at speeds of up to 9 m/s.

What carries the argument

DreamerV3, the model-based reinforcement learning method that builds an internal world model from pixel observations to plan sequences of control commands.

If this is right

  • Visuomotor policies for drone racing can be learned without intermediate representations or heavy imitation learning bootstrapping.
  • Perception-aware camera steering arises automatically from the model-based training process.
  • Real-world deployment reaches speeds of 9 m/s on physical quadrotors.
  • Model-based methods provide a sample-efficient route for pixel-to-command control where model-free methods fail.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same model-learning approach could extend to other vision-only robotic tasks that currently require extensive real-world data collection.
  • Improving the visual realism of the simulator might allow fully zero-shot transfer without the hardware-in-the-loop step.
  • Emergent behaviors without explicit rewards point to possible discovery of useful strategies in related control problems such as navigation through cluttered spaces.

Load-bearing premise

Rendered images supplied during hardware-in-the-loop testing are close enough to real camera images and quadrotor dynamics that policies transfer without further adjustment.

What would settle it

Running the learned policy on the physical quadrotor while feeding it live camera images instead of rendered ones and checking whether it still completes the track at speeds near 9 m/s.

Figures

Figures reproduced from arXiv: 2501.14377 by Angel Romero, Ashwin Shenai, Davide Scaramuzza, Elie Aljalbout, Ismail Geles.

Figure 1
Figure 1. Figure 1: Real-world deployment of our DreamerV3 policy in the Figure [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The process begins with data collection in the real environment using the current policy, storing experiences in a replay buffer. This buffer is used [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reward evolution by number of steps for three different tracks: Circle track, Kidney Track and Figure 8 track. The training performance of DreamerV3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of real observations and imagined observations for the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study on the perception aware behaviour of our policies. Top: DreamerV3 policy trained on pixel observations in an environment where the only rendered gates are the actual gates. As indicated by the black arrows (representing camera direction), the platform predominantly focuses its attention on the next gate. Bottom: We introduce two additional gates to the rendering engine (marked in red color).… view at source ↗
Figure 6
Figure 6. Figure 6: Real-world experimental setup. Our drone is equipped with a RF [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Autonomous drone racing has risen as a challenging robotic benchmark for testing the limits of learning, perception, planning, and control. Expert human pilots are able to fly a drone through a race track by mapping pixels from a single camera directly to control commands. Recent works in autonomous drone racing attempting direct pixel-to-commands control policies have relied on either intermediate representations that simplify the observation space or performed extensive bootstrapping using Imitation Learning (IL). This paper leverages DreamerV3 to train visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods like PPO or SAC, which are sample-inefficient and struggle in this setting, our approach acquires drone racing skills from pixels. Notably, a perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without the need of handcrafted reward terms for the viewing direction. Our experiments show in both, simulation and real-world flight using a hardware-in-the-loop setup with rendered image observations, how the proposed approach can be deployed on real quadrotors at speeds of up to 9 m/s. These results advance the state of pixel-based autonomous flight and demonstrate that MBRL offers a promising path for real-world robotics research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that DreamerV3 enables training of visuomotor policies for autonomous drone racing directly from pixel observations, achieving agile flight through a racetrack with emergent perception-aware camera steering (without handcrafted rewards for viewing direction). It reports that the approach outperforms model-free methods like PPO and SAC, and can be deployed on real quadrotors at speeds up to 9 m/s via hardware-in-the-loop (HIL) experiments using rendered image observations in both simulation and real-world flight.

Significance. If the central claims hold after addressing the transfer gap, the work would show that model-based RL can handle high-speed pixel-to-command control in robotics without imitation learning or intermediate representations, advancing pixel-based autonomous flight. The reported emergence of perception-aware behavior is a strength worth highlighting, as it arises without explicit reward engineering.

major comments (2)
  1. [Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.
  2. [Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.
minor comments (1)
  1. [Abstract] Abstract: The phrasing 'in both, simulation and real-world flight' contains a comma splice and should be revised for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that our hardware-in-the-loop (HIL) results use rendered observations and that the abstract would benefit from additional quantitative detail. We respond point-by-point below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.

    Authors: We agree that the HIL experiments employ rendered image observations and that no ablations or direct comparisons are provided to demonstrate equivalence with real camera statistics (noise, distortion, lighting, texture). This leaves the perceptual component of sim-to-real transfer untested, which is a substantive limitation. In the revision we will explicitly state in the abstract and method sections that the real-world results are HIL with rendered observations, and we will add a dedicated limitations paragraph discussing the untested perceptual transfer. We maintain that the HIL setup still offers meaningful validation by exercising the policy on physical quadrotor dynamics at 9 m/s, but we do not claim full perceptual realism. revision: partial

  2. Referee: [Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.

    Authors: The full manuscript reports simulation metrics and PPO/SAC comparisons; the abstract summarizes the 9 m/s HIL speed but omits per-experiment numbers. We will revise the abstract to include key HIL metrics (success rate, lap time, failure modes) and ensure the experiments section supplies the corresponding quantitative tables and any available baseline comparisons for the HIL condition, allowing direct evaluation of the claims. revision: yes

Circularity Check

0 steps flagged

No circularity; results rest on reported experiments

full rationale

The paper applies an existing MBRL algorithm (DreamerV3) to a visuomotor drone-racing task and supports its claims with simulation and hardware-in-the-loop experiments. No derivation, prediction, or uniqueness claim reduces by construction to fitted parameters, self-citations, or definitional equivalence. The central result (pixel-to-command policies achieving 9 m/s in HIL) is presented as an empirical outcome rather than an input presupposed by the method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The approach inherits the standard modeling assumptions of DreamerV3 and the sim-to-real transfer validity of the hardware-in-the-loop setup.

pith-pipeline@v0.9.0 · 5760 in / 1039 out tokens · 24761 ms · 2026-05-23T04:51:20.047279+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

  1. [1]

    Au- tonomous drone racing: A survey

    Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, and Davide Scaramuzza. Au- tonomous drone racing: A survey. IEEE Transactions on Robotics, 2024

  2. [2]

    Challenges and implemented technologies used in autonomous drone racing

    Hyungpil Moon, Jose Martinez-Carranza, Titus Cieslewski, Matthias Faessler, Davide Falanga, Alessandro Simovic, Davide Scaramuzza, Shuo Li, Michael Ozo, Christophe De Wagter, et al. Challenges and implemented technologies used in autonomous drone racing. Intelligent Service Robotics , 2019. 3https://github.com/danijar/dreamerv3

  3. [3]

    Champion-level drone racing using deep rein- forcement learning

    Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning. Nature, 620(7976):982–987, 2023

  4. [4]

    Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning

    Yunlong Song, Angel Romero, Matthias M ¨uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning. Science Robotics, 8(82):eadg1462, 2023

  5. [5]

    A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge

    Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge. Journal of Field Robotics , 35(1):146– 166, 2018

  6. [6]

    Deep drone racing: Learning agile flight in dynamic environments

    Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. Deep drone racing: Learning agile flight in dynamic environments. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Con- ference on Robot Learning , volume 87 of Proceedings of Machine Learning Research , page...

  7. [7]

    The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition

    Christophe De Wagter, Federico Paredes-Vall ´es, Nilay Sheth, and Guido de Croon. The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition. arXiv preprint arXiv:2109.14985 , 2021

  8. [8]

    Foehn, D

    P. Foehn, D. Brescianini, E. Kaufmann, T. Cieslewski, M. Gehrig, M. Muglikar, and D. Scaramuzza. Alphapilot: Autonomous drone racing. Robotics: Science and Sys- tems (RSS), 2020. URL https://link.springer.com/article/ 10.1007/s11370-018-00271-6

  9. [9]

    Time-optimal planning for quadrotor waypoint flight

    Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6(56):eabh1221, 2021

  10. [10]

    Multi-task reinforcement learning for quadrotors

    Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljal- bout, and Davide Scaramuzza. Multi-task reinforcement learning for quadrotors. IEEE Robotics and Automation Letters, 2024

  11. [11]

    Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing

    Angel Romero, Elie Aljalbout, Yunlong Song, and Da- vide Scaramuzza. Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing. arXiv preprint arXiv:2306.09852 , 2024. URL https://arxiv.org/abs/2306.09852

  12. [12]

    Demonstrating agile flight from pixels without state estimation

    Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation. Robotics: Science and Systems , 2024

  13. [13]

    Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight

    Jiaxu Xing, Angel Romero, Leonard Bauersfeld, and Davide Scaramuzza. Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight. 8th Conference on Robot Learning (CoRL) , 2024

  14. [14]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timo- thy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 , 2023

  15. [15]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017

  16. [16]

    M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757

  17. [17]

    Rusu, Joel Veness, Marc G

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku- maran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcemen...

  18. [18]

    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel- vam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the...

  19. [19]

    DeepMind Control Suite

    Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. DeepMind Control Suite, January 2018. URL http://arxiv.org/abs/1801. 00690. arXiv:1801.00690 [cs]

  20. [20]

    Improving sample efficiency in model-free reinforcement learning from images

    Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10674–10681, May 2021. ISSN 2374-3468, 2159-5399

  21. [21]

    Learning vision-based reactive policies for obstacle avoidance

    Elie Aljalbout, Ji Chen, Konstantin Ritt, Maximilian Ul- mer, and Sami Haddadin. Learning vision-based reactive policies for obstacle avoidance. In Conference on Robot Learning, pages 2040–2054. PMLR, 2021

  22. [22]

    Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson

    Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , volume 97 of...

  23. [23]

    CURL: Contrastive unsupervised representations for re- inforcement learning

    Michael Laskin, Aravind Srinivas, and Pieter Abbeel. CURL: Contrastive unsupervised representations for re- inforcement learning. In Hal Daum ´e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 5639–5650. PMLR, 13–18 Jul 2020

  24. [24]

    Reinforcement learning with augmented data

    Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Bal- can, and Hsuan-Tien Lin, editors, Advances in Neural In- formation Processing Systems 33: Annual Conference on Neural Information Processing System...

  25. [25]

    Mastering visual continuous control: Improved data-augmented reinforcement learning

    Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenRe- view.net, 2022

  26. [26]

    Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell

    Andrei A. Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 2017

  27. [27]

    Domain ran- domization for transferring deep neural networks from simulation to the real world

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, September 24-28, 2017, pages 23–30. IEEE, 2017

  28. [28]

    End-to-end training of deep visuomotor policies

    Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. , 17:39:1–39:40, 2016

  29. [29]

    Zhao, and Chelsea Finn

    Zipeng Fu, Tony Z. Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation. In arXiv, 2024

  30. [30]

    Dronet: Learning to fly by driving

    Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters , 3(2):1088–1095, 2018

  31. [31]

    GNM: A General Navigation Model to Drive Any Robot

    Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hi- rose, and Sergey Levine. GNM: A General Navigation Model to Drive Any Robot. In International Confer- ence on Robotics and Automation (ICRA) , 2023. URL https://arxiv.org/abs/2210.03370

  32. [32]

    ViNT: A foundation model for visual navigation

    Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. In 7th Annual Conference on Robot Learning, 2023. URL https://arxiv.org/abs/2306.14846

  33. [33]

    NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

    Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. arXiv pre-print , 2023. URL https://arxiv.org/abs/2310.07896

  34. [34]

    Deep drone acrobatics

    Elia Kaufmann, Antonio Loquercio, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Deep drone acrobatics. In Proceedings of Robotics: Science and Systems , Corvalis, Oregon, USA, July 2020

  35. [35]

    Reinforcement learning for uav attitude control

    William Koch, Renato Mancuso, Richard West, and Azer Bestavros. Reinforcement learning for uav attitude control. ACM Transactions on Cyber-Physical Systems , 3(2):1–21, 2019

  36. [36]

    Low-level control of a quadrotor with deep model-based reinforcement learning

    Nathan O Lambert, Daniel S Drew, Joseph Yaconelli, Sergey Levine, Roberto Calandra, and Kristofer SJ Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224–4230, 2019

  37. [37]

    End-to-end reinforcement learn- ing for time-optimal quadcopter flight

    Robin Ferede, Christophe De Wagter, Dario Izzo, and Guido CHE de Croon. End-to-end reinforcement learn- ing for time-optimal quadcopter flight. arXiv preprint arXiv:2311.16948, 2023

  38. [38]

    Learning to fly in seconds

    Jonas Eschmann, Dario Albani, and Giuseppe Loianno. Learning to fly in seconds. arXiv e-prints, pages arXiv– 2311, 2023

  39. [39]

    CAD2RL: Real Single-Image Flight without a Single Real Image

    Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016

  40. [40]

    Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight

    Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laen- gle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, et al. Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight. Science Robotics , 7 (67):eabl6259, 2022

  41. [41]

    Learning to fly via deep model-based reinforcement learning

    Philip Becker-Ehmck, Maximilian Karl, Jan Peters, and Patrick van der Smagt. Learning to fly via deep model-based reinforcement learning. arXiv preprint arXiv:2003.08876, 2020

  42. [42]

    From Pixels to Torques: Policy Learning with Deep Dynamical Models

    Niklas Wahlstr ¨om, Thomas B Sch ¨on, and Marc Pe- ter Deisenroth. From pixels to torques: Policy learn- ing with deep dynamical models. arXiv preprint arXiv:1502.02251, 2015

  43. [43]

    Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning

    Thomas Bi and Raffaello D’Andrea. Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 7455–7460. IEEE, 2024

  44. [44]

    Daydreamer: World models for physical robot learning

    Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Conference on Robot Learning (CoRL). PMLR, 2022

  45. [45]

    Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

    Christian Pfeiffer and Davide Scaramuzza. Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

  46. [46]

    Learning latent dynamics for planning from pixels

    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Ma- chine Learning , volume 97 of Proceedings of Machine Learning Research, pages 2555–256...

  47. [47]

    URL https://proceedings.mlr.press/v97/hafner19a. html

  48. [48]

    Stable-baselines3: Reliable reinforcement learning im- plementations

    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning im- plementations. Journal of Machine Learning Research , 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/ 20-1364.html

  49. [49]

    Flightmare: A flexible quadrotor simulator

    Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Lo- quercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. In Conference on Robot Learning , 2020

  50. [50]

    Habitat: A platform for embodied ai research

    Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019

  51. [51]

    Habitat 3.0: A co-habitat for humans, avatars, and robots

    Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander Clegg, Michal Hlavac, So Yeon Min, Vladim ´ır V ondruˇs, Theophile Gervet, Vincent-Pierre Berges, John M Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Aksha...

  52. [52]

    Autonomous drone racing with deep reinforcement learning

    Yunlong Song, Mats Steinweg, Elia Kaufmann, and Da- vide Scaramuzza. Autonomous drone racing with deep reinforcement learning. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS) , 2021

  53. [53]

    Lillicrap, Jimmy Ba, and Mo- hammad Norouzi

    Danijar Hafner, Timothy P. Lillicrap, Jimmy Ba, and Mo- hammad Norouzi. Dream to control: Learning behaviors by latent imagination. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020