Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Angel Romero; Ashwin Shenai; Davide Scaramuzza; Elie Aljalbout; Ismail Geles

arxiv: 2501.14377 · v2 · submitted 2025-01-24 · 💻 cs.RO

Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Angel Romero , Ashwin Shenai , Ismail Geles , Elie Aljalbout , Davide Scaramuzza This is my paper

Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3

classification 💻 cs.RO

keywords drone racingmodel-based reinforcement learningvision-based controlvisuomotor policiesautonomous flightsim-to-real transferpixel observations

0 comments

The pith

Model-based reinforcement learning trains drone policies that fly race tracks from camera pixels alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that DreamerV3 learns policies mapping raw camera images directly to control commands for flying a quadrotor through a sequence of gates. This works without the simplified observations or large amounts of imitation learning data that earlier pixel-based methods needed. Model-free algorithms like PPO and SAC prove too sample-inefficient in the same setting. A behavior of actively pointing the camera at textured gates appears even though no reward encourages it. The resulting policies transfer to real hardware at speeds up to 9 m/s when tested in a hardware-in-the-loop setup that supplies rendered images.

Core claim

DreamerV3 trains visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods, this approach acquires drone racing skills from pixels. A perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without handcrafted reward terms. Experiments in simulation and real-world flight with a hardware-in-the-loop setup demonstrate deployment on real quadrotors at speeds of up to 9 m/s.

What carries the argument

DreamerV3, the model-based reinforcement learning method that builds an internal world model from pixel observations to plan sequences of control commands.

If this is right

Visuomotor policies for drone racing can be learned without intermediate representations or heavy imitation learning bootstrapping.
Perception-aware camera steering arises automatically from the model-based training process.
Real-world deployment reaches speeds of 9 m/s on physical quadrotors.
Model-based methods provide a sample-efficient route for pixel-to-command control where model-free methods fail.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same model-learning approach could extend to other vision-only robotic tasks that currently require extensive real-world data collection.
Improving the visual realism of the simulator might allow fully zero-shot transfer without the hardware-in-the-loop step.
Emergent behaviors without explicit rewards point to possible discovery of useful strategies in related control problems such as navigation through cluttered spaces.

Load-bearing premise

Rendered images supplied during hardware-in-the-loop testing are close enough to real camera images and quadrotor dynamics that policies transfer without further adjustment.

What would settle it

Running the learned policy on the physical quadrotor while feeding it live camera images instead of rendered ones and checking whether it still completes the track at speeds near 9 m/s.

Figures

Figures reproduced from arXiv: 2501.14377 by Angel Romero, Ashwin Shenai, Davide Scaramuzza, Elie Aljalbout, Ismail Geles.

**Figure 2.** Figure 2: The process begins with data collection in the real environment using the current policy, storing experiences in a replay buffer. This buffer is used [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Reward evolution by number of steps for three different tracks: Circle track, Kidney Track and Figure 8 track. The training performance of DreamerV3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of real observations and imagined observations for the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study on the perception aware behaviour of our policies. Top: DreamerV3 policy trained on pixel observations in an environment where the only rendered gates are the actual gates. As indicated by the black arrows (representing camera direction), the platform predominantly focuses its attention on the next gate. Bottom: We introduce two additional gates to the rendering engine (marked in red color).… view at source ↗

**Figure 6.** Figure 6: Real-world experimental setup. Our drone is equipped with a RF [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Autonomous drone racing has risen as a challenging robotic benchmark for testing the limits of learning, perception, planning, and control. Expert human pilots are able to fly a drone through a race track by mapping pixels from a single camera directly to control commands. Recent works in autonomous drone racing attempting direct pixel-to-commands control policies have relied on either intermediate representations that simplify the observation space or performed extensive bootstrapping using Imitation Learning (IL). This paper leverages DreamerV3 to train visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods like PPO or SAC, which are sample-inefficient and struggle in this setting, our approach acquires drone racing skills from pixels. Notably, a perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without the need of handcrafted reward terms for the viewing direction. Our experiments show in both, simulation and real-world flight using a hardware-in-the-loop setup with rendered image observations, how the proposed approach can be deployed on real quadrotors at speeds of up to 9 m/s. These results advance the state of pixel-based autonomous flight and demonstrate that MBRL offers a promising path for real-world robotics research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DreamerV3 learns pixel-to-action drone racing and runs at 9 m/s in HIL, but the real-camera transfer step is untested.

read the letter

The paper shows that DreamerV3 can train visuomotor policies for drone racing directly from pixels, without imitation learning or hand-engineered intermediate representations, and that these policies can be deployed on real quadrotors at speeds up to 9 m/s in a hardware-in-the-loop setup that feeds rendered images. That is the central result. It also reports an emergent perception-aware behavior where the policy steers toward textured gates without any explicit reward term for camera direction. Model-based RL appears to avoid the sample-efficiency problems that PPO and SAC run into here. Those are the concrete advances over the prior drone-racing literature cited in the abstract. The work is straightforward in its use of an existing algorithm on a new domain and does not overclaim the method itself. The soft spot is exactly where the stress-test note points: the real-world claim rests on HIL flights that use rendered observations rather than the drone’s actual onboard camera. No ablation or side-by-side comparison against real camera images is mentioned, so the statistics of noise, lighting, lens effects, and texture under flight conditions remain unverified. The abstract also gives no quantitative metrics, success rates, or detailed baselines, which leaves the strength of the result hard to judge from the provided text. This paper is for researchers working on pixel-based agile control and sample-efficient RL for robotics. A reader already familiar with DreamerV3 will get the most out of the domain transfer and the emergent behavior. It is coherent on its own terms and deserves a serious referee to check the full experimental details and the HIL-to-real gap, even though the current evidence for end-to-end real-camera deployment is limited.

Referee Report

2 major / 1 minor

Summary. The paper claims that DreamerV3 enables training of visuomotor policies for autonomous drone racing directly from pixel observations, achieving agile flight through a racetrack with emergent perception-aware camera steering (without handcrafted rewards for viewing direction). It reports that the approach outperforms model-free methods like PPO and SAC, and can be deployed on real quadrotors at speeds up to 9 m/s via hardware-in-the-loop (HIL) experiments using rendered image observations in both simulation and real-world flight.

Significance. If the central claims hold after addressing the transfer gap, the work would show that model-based RL can handle high-speed pixel-to-command control in robotics without imitation learning or intermediate representations, advancing pixel-based autonomous flight. The reported emergence of perception-aware behavior is a strength worth highlighting, as it arises without explicit reward engineering.

major comments (2)

[Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.
[Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.

minor comments (1)

[Abstract] Abstract: The phrasing 'in both, simulation and real-world flight' contains a comma splice and should be revised for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that our hardware-in-the-loop (HIL) results use rendered observations and that the abstract would benefit from additional quantitative detail. We respond point-by-point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.

Authors: We agree that the HIL experiments employ rendered image observations and that no ablations or direct comparisons are provided to demonstrate equivalence with real camera statistics (noise, distortion, lighting, texture). This leaves the perceptual component of sim-to-real transfer untested, which is a substantive limitation. In the revision we will explicitly state in the abstract and method sections that the real-world results are HIL with rendered observations, and we will add a dedicated limitations paragraph discussing the untested perceptual transfer. We maintain that the HIL setup still offers meaningful validation by exercising the policy on physical quadrotor dynamics at 9 m/s, but we do not claim full perceptual realism. revision: partial
Referee: [Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.

Authors: The full manuscript reports simulation metrics and PPO/SAC comparisons; the abstract summarizes the 9 m/s HIL speed but omits per-experiment numbers. We will revise the abstract to include key HIL metrics (success rate, lap time, failure modes) and ensure the experiments section supplies the corresponding quantitative tables and any available baseline comparisons for the HIL condition, allowing direct evaluation of the claims. revision: yes

Circularity Check

0 steps flagged

No circularity; results rest on reported experiments

full rationale

The paper applies an existing MBRL algorithm (DreamerV3) to a visuomotor drone-racing task and supports its claims with simulation and hardware-in-the-loop experiments. No derivation, prediction, or uniqueness claim reduces by construction to fitted parameters, self-citations, or definitional equivalence. The central result (pixel-to-command policies achieving 9 m/s in HIL) is presented as an empirical outcome rather than an input presupposed by the method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The approach inherits the standard modeling assumptions of DreamerV3 and the sim-to-real transfer validity of the hardware-in-the-loop setup.

pith-pipeline@v0.9.0 · 5760 in / 1039 out tokens · 24761 ms · 2026-05-23T04:51:20.047279+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

leverages DreamerV3 to train visuomotor policies... world model... RSSM... reward function... progress term b1(∥gk−pk−1∥−∥gk−pk∥)−b2∥ωk∥
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

emergent perception-aware behaviour... no handcrafted reward terms for viewing direction

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

[1]

Au- tonomous drone racing: A survey

Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, and Davide Scaramuzza. Au- tonomous drone racing: A survey. IEEE Transactions on Robotics, 2024

work page 2024
[2]

Challenges and implemented technologies used in autonomous drone racing

Hyungpil Moon, Jose Martinez-Carranza, Titus Cieslewski, Matthias Faessler, Davide Falanga, Alessandro Simovic, Davide Scaramuzza, Shuo Li, Michael Ozo, Christophe De Wagter, et al. Challenges and implemented technologies used in autonomous drone racing. Intelligent Service Robotics , 2019. 3https://github.com/danijar/dreamerv3

work page 2019
[3]

Champion-level drone racing using deep rein- forcement learning

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning. Nature, 620(7976):982–987, 2023

work page 2023
[4]

Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning

Yunlong Song, Angel Romero, Matthias M ¨uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning. Science Robotics, 8(82):eadg1462, 2023

work page 2023
[5]

A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge

Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge. Journal of Field Robotics , 35(1):146– 166, 2018

work page 2016
[6]

Deep drone racing: Learning agile flight in dynamic environments

Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. Deep drone racing: Learning agile flight in dynamic environments. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Con- ference on Robot Learning , volume 87 of Proceedings of Machine Learning Research , page...

work page 2018
[7]

The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition

Christophe De Wagter, Federico Paredes-Vall ´es, Nilay Sheth, and Guido de Croon. The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition. arXiv preprint arXiv:2109.14985 , 2021

work page arXiv 2019
[8]

Foehn, D

P. Foehn, D. Brescianini, E. Kaufmann, T. Cieslewski, M. Gehrig, M. Muglikar, and D. Scaramuzza. Alphapilot: Autonomous drone racing. Robotics: Science and Sys- tems (RSS), 2020. URL https://link.springer.com/article/ 10.1007/s11370-018-00271-6

work page doi:10.1007/s11370-018-00271-6 2020
[9]

Time-optimal planning for quadrotor waypoint flight

Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6(56):eabh1221, 2021

work page 2021
[10]

Multi-task reinforcement learning for quadrotors

Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljal- bout, and Davide Scaramuzza. Multi-task reinforcement learning for quadrotors. IEEE Robotics and Automation Letters, 2024

work page 2024
[11]

Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing

Angel Romero, Elie Aljalbout, Yunlong Song, and Da- vide Scaramuzza. Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing. arXiv preprint arXiv:2306.09852 , 2024. URL https://arxiv.org/abs/2306.09852

work page arXiv 2024
[12]

Demonstrating agile flight from pixels without state estimation

Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation. Robotics: Science and Systems , 2024

work page 2024
[13]

Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight

Jiaxu Xing, Angel Romero, Leonard Bauersfeld, and Davide Scaramuzza. Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight. 8th Conference on Robot Learning (CoRL) , 2024

work page 2024
[14]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timo- thy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757

work page 2013
[17]

Rusu, Joel Veness, Marc G

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku- maran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcemen...

work page 2015
[18]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel- vam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the...

work page 2016
[19]

DeepMind Control Suite

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. DeepMind Control Suite, January 2018. URL http://arxiv.org/abs/1801. 00690. arXiv:1801.00690 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Improving sample efficiency in model-free reinforcement learning from images

Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10674–10681, May 2021. ISSN 2374-3468, 2159-5399

work page 2021
[21]

Learning vision-based reactive policies for obstacle avoidance

Elie Aljalbout, Ji Chen, Konstantin Ritt, Maximilian Ul- mer, and Sami Haddadin. Learning vision-based reactive policies for obstacle avoidance. In Conference on Robot Learning, pages 2040–2054. PMLR, 2021

work page 2040
[22]

Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson

Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , volume 97 of...

work page 2019
[23]

CURL: Contrastive unsupervised representations for re- inforcement learning

Michael Laskin, Aravind Srinivas, and Pieter Abbeel. CURL: Contrastive unsupervised representations for re- inforcement learning. In Hal Daum ´e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 5639–5650. PMLR, 13–18 Jul 2020

work page 2020
[24]

Reinforcement learning with augmented data

Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Bal- can, and Hsuan-Tien Lin, editors, Advances in Neural In- formation Processing Systems 33: Annual Conference on Neural Information Processing System...

work page 2020
[25]

Mastering visual continuous control: Improved data-augmented reinforcement learning

Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenRe- view.net, 2022

work page 2022
[26]

Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell

Andrei A. Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 2017

work page 2017
[27]

Domain ran- domization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, September 24-28, 2017, pages 23–30. IEEE, 2017

work page 2017
[28]

End-to-end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. , 17:39:1–39:40, 2016

work page 2016
[29]

Zhao, and Chelsea Finn

Zipeng Fu, Tony Z. Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation. In arXiv, 2024

work page 2024
[30]

Dronet: Learning to fly by driving

Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters , 3(2):1088–1095, 2018

work page 2018
[31]

GNM: A General Navigation Model to Drive Any Robot

Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hi- rose, and Sergey Levine. GNM: A General Navigation Model to Drive Any Robot. In International Confer- ence on Robotics and Automation (ICRA) , 2023. URL https://arxiv.org/abs/2210.03370

work page arXiv 2023
[32]

ViNT: A foundation model for visual navigation

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. In 7th Annual Conference on Robot Learning, 2023. URL https://arxiv.org/abs/2306.14846

work page arXiv 2023
[33]

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. arXiv pre-print , 2023. URL https://arxiv.org/abs/2310.07896

work page arXiv 2023
[34]

Deep drone acrobatics

Elia Kaufmann, Antonio Loquercio, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Deep drone acrobatics. In Proceedings of Robotics: Science and Systems , Corvalis, Oregon, USA, July 2020

work page 2020
[35]

Reinforcement learning for uav attitude control

William Koch, Renato Mancuso, Richard West, and Azer Bestavros. Reinforcement learning for uav attitude control. ACM Transactions on Cyber-Physical Systems , 3(2):1–21, 2019

work page 2019
[36]

Low-level control of a quadrotor with deep model-based reinforcement learning

Nathan O Lambert, Daniel S Drew, Joseph Yaconelli, Sergey Levine, Roberto Calandra, and Kristofer SJ Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224–4230, 2019

work page 2019
[37]

End-to-end reinforcement learn- ing for time-optimal quadcopter flight

Robin Ferede, Christophe De Wagter, Dario Izzo, and Guido CHE de Croon. End-to-end reinforcement learn- ing for time-optimal quadcopter flight. arXiv preprint arXiv:2311.16948, 2023

work page arXiv 2023
[38]

Learning to fly in seconds

Jonas Eschmann, Dario Albani, and Giuseppe Loianno. Learning to fly in seconds. arXiv e-prints, pages arXiv– 2311, 2023

work page 2023
[39]

CAD2RL: Real Single-Image Flight without a Single Real Image

Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[40]

Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight

Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laen- gle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, et al. Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight. Science Robotics , 7 (67):eabl6259, 2022

work page 2022
[41]

Learning to fly via deep model-based reinforcement learning

Philip Becker-Ehmck, Maximilian Karl, Jan Peters, and Patrick van der Smagt. Learning to fly via deep model-based reinforcement learning. arXiv preprint arXiv:2003.08876, 2020

work page arXiv 2003
[42]

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Niklas Wahlstr ¨om, Thomas B Sch ¨on, and Marc Pe- ter Deisenroth. From pixels to torques: Policy learn- ing with deep dynamical models. arXiv preprint arXiv:1502.02251, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[43]

Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning

Thomas Bi and Raffaello D’Andrea. Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 7455–7460. IEEE, 2024

work page 2024
[44]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Conference on Robot Learning (CoRL). PMLR, 2022

work page 2022
[45]

Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

Christian Pfeiffer and Davide Scaramuzza. Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

work page 2021
[46]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Ma- chine Learning , volume 97 of Proceedings of Machine Learning Research, pages 2555–256...

work page
[47]

URL https://proceedings.mlr.press/v97/hafner19a. html

work page
[48]

Stable-baselines3: Reliable reinforcement learning im- plementations

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning im- plementations. Journal of Machine Learning Research , 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/ 20-1364.html

work page 2021
[49]

Flightmare: A flexible quadrotor simulator

Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Lo- quercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. In Conference on Robot Learning , 2020

work page 2020
[50]

Habitat: A platform for embodied ai research

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019

work page 2019
[51]

Habitat 3.0: A co-habitat for humans, avatars, and robots

Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander Clegg, Michal Hlavac, So Yeon Min, Vladim ´ır V ondruˇs, Theophile Gervet, Vincent-Pierre Berges, John M Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Aksha...

work page 2024
[52]

Autonomous drone racing with deep reinforcement learning

Yunlong Song, Mats Steinweg, Elia Kaufmann, and Da- vide Scaramuzza. Autonomous drone racing with deep reinforcement learning. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS) , 2021

work page 2021
[53]

Lillicrap, Jimmy Ba, and Mo- hammad Norouzi

Danijar Hafner, Timothy P. Lillicrap, Jimmy Ba, and Mo- hammad Norouzi. Dream to control: Learning behaviors by latent imagination. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020

work page 2020

[1] [1]

Au- tonomous drone racing: A survey

Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, and Davide Scaramuzza. Au- tonomous drone racing: A survey. IEEE Transactions on Robotics, 2024

work page 2024

[2] [2]

Challenges and implemented technologies used in autonomous drone racing

Hyungpil Moon, Jose Martinez-Carranza, Titus Cieslewski, Matthias Faessler, Davide Falanga, Alessandro Simovic, Davide Scaramuzza, Shuo Li, Michael Ozo, Christophe De Wagter, et al. Challenges and implemented technologies used in autonomous drone racing. Intelligent Service Robotics , 2019. 3https://github.com/danijar/dreamerv3

work page 2019

[3] [3]

Champion-level drone racing using deep rein- forcement learning

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning. Nature, 620(7976):982–987, 2023

work page 2023

[4] [4]

Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning

Yunlong Song, Angel Romero, Matthias M ¨uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning. Science Robotics, 8(82):eadg1462, 2023

work page 2023

[5] [5]

A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge

Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge. Journal of Field Robotics , 35(1):146– 166, 2018

work page 2016

[6] [6]

Deep drone racing: Learning agile flight in dynamic environments

Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. Deep drone racing: Learning agile flight in dynamic environments. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Con- ference on Robot Learning , volume 87 of Proceedings of Machine Learning Research , page...

work page 2018

[7] [7]

The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition

Christophe De Wagter, Federico Paredes-Vall ´es, Nilay Sheth, and Guido de Croon. The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition. arXiv preprint arXiv:2109.14985 , 2021

work page arXiv 2019

[8] [8]

Foehn, D

P. Foehn, D. Brescianini, E. Kaufmann, T. Cieslewski, M. Gehrig, M. Muglikar, and D. Scaramuzza. Alphapilot: Autonomous drone racing. Robotics: Science and Sys- tems (RSS), 2020. URL https://link.springer.com/article/ 10.1007/s11370-018-00271-6

work page doi:10.1007/s11370-018-00271-6 2020

[9] [9]

Time-optimal planning for quadrotor waypoint flight

Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6(56):eabh1221, 2021

work page 2021

[10] [10]

Multi-task reinforcement learning for quadrotors

Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljal- bout, and Davide Scaramuzza. Multi-task reinforcement learning for quadrotors. IEEE Robotics and Automation Letters, 2024

work page 2024

[11] [11]

Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing

Angel Romero, Elie Aljalbout, Yunlong Song, and Da- vide Scaramuzza. Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing. arXiv preprint arXiv:2306.09852 , 2024. URL https://arxiv.org/abs/2306.09852

work page arXiv 2024

[12] [12]

Demonstrating agile flight from pixels without state estimation

Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation. Robotics: Science and Systems , 2024

work page 2024

[13] [13]

Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight

Jiaxu Xing, Angel Romero, Leonard Bauersfeld, and Davide Scaramuzza. Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight. 8th Conference on Robot Learning (CoRL) , 2024

work page 2024

[14] [14]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timo- thy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757

work page 2013

[17] [17]

Rusu, Joel Veness, Marc G

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku- maran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcemen...

work page 2015

[18] [18]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel- vam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the...

work page 2016

[19] [19]

DeepMind Control Suite

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. DeepMind Control Suite, January 2018. URL http://arxiv.org/abs/1801. 00690. arXiv:1801.00690 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Improving sample efficiency in model-free reinforcement learning from images

Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10674–10681, May 2021. ISSN 2374-3468, 2159-5399

work page 2021

[21] [21]

Learning vision-based reactive policies for obstacle avoidance

Elie Aljalbout, Ji Chen, Konstantin Ritt, Maximilian Ul- mer, and Sami Haddadin. Learning vision-based reactive policies for obstacle avoidance. In Conference on Robot Learning, pages 2040–2054. PMLR, 2021

work page 2040

[22] [22]

Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson

Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , volume 97 of...

work page 2019

[23] [23]

CURL: Contrastive unsupervised representations for re- inforcement learning

Michael Laskin, Aravind Srinivas, and Pieter Abbeel. CURL: Contrastive unsupervised representations for re- inforcement learning. In Hal Daum ´e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 5639–5650. PMLR, 13–18 Jul 2020

work page 2020

[24] [24]

Reinforcement learning with augmented data

Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Bal- can, and Hsuan-Tien Lin, editors, Advances in Neural In- formation Processing Systems 33: Annual Conference on Neural Information Processing System...

work page 2020

[25] [25]

Mastering visual continuous control: Improved data-augmented reinforcement learning

Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenRe- view.net, 2022

work page 2022

[26] [26]

Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell

Andrei A. Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 2017

work page 2017

[27] [27]

Domain ran- domization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, September 24-28, 2017, pages 23–30. IEEE, 2017

work page 2017

[28] [28]

End-to-end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. , 17:39:1–39:40, 2016

work page 2016

[29] [29]

Zhao, and Chelsea Finn

Zipeng Fu, Tony Z. Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation. In arXiv, 2024

work page 2024

[30] [30]

Dronet: Learning to fly by driving

Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters , 3(2):1088–1095, 2018

work page 2018

[31] [31]

GNM: A General Navigation Model to Drive Any Robot

Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hi- rose, and Sergey Levine. GNM: A General Navigation Model to Drive Any Robot. In International Confer- ence on Robotics and Automation (ICRA) , 2023. URL https://arxiv.org/abs/2210.03370

work page arXiv 2023

[32] [32]

ViNT: A foundation model for visual navigation

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. In 7th Annual Conference on Robot Learning, 2023. URL https://arxiv.org/abs/2306.14846

work page arXiv 2023

[33] [33]

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. arXiv pre-print , 2023. URL https://arxiv.org/abs/2310.07896

work page arXiv 2023

[34] [34]

Deep drone acrobatics

Elia Kaufmann, Antonio Loquercio, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Deep drone acrobatics. In Proceedings of Robotics: Science and Systems , Corvalis, Oregon, USA, July 2020

work page 2020

[35] [35]

Reinforcement learning for uav attitude control

William Koch, Renato Mancuso, Richard West, and Azer Bestavros. Reinforcement learning for uav attitude control. ACM Transactions on Cyber-Physical Systems , 3(2):1–21, 2019

work page 2019

[36] [36]

Low-level control of a quadrotor with deep model-based reinforcement learning

Nathan O Lambert, Daniel S Drew, Joseph Yaconelli, Sergey Levine, Roberto Calandra, and Kristofer SJ Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224–4230, 2019

work page 2019

[37] [37]

End-to-end reinforcement learn- ing for time-optimal quadcopter flight

Robin Ferede, Christophe De Wagter, Dario Izzo, and Guido CHE de Croon. End-to-end reinforcement learn- ing for time-optimal quadcopter flight. arXiv preprint arXiv:2311.16948, 2023

work page arXiv 2023

[38] [38]

Learning to fly in seconds

Jonas Eschmann, Dario Albani, and Giuseppe Loianno. Learning to fly in seconds. arXiv e-prints, pages arXiv– 2311, 2023

work page 2023

[39] [39]

CAD2RL: Real Single-Image Flight without a Single Real Image

Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[40] [40]

Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight

Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laen- gle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, et al. Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight. Science Robotics , 7 (67):eabl6259, 2022

work page 2022

[41] [41]

Learning to fly via deep model-based reinforcement learning

Philip Becker-Ehmck, Maximilian Karl, Jan Peters, and Patrick van der Smagt. Learning to fly via deep model-based reinforcement learning. arXiv preprint arXiv:2003.08876, 2020

work page arXiv 2003

[42] [42]

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Niklas Wahlstr ¨om, Thomas B Sch ¨on, and Marc Pe- ter Deisenroth. From pixels to torques: Policy learn- ing with deep dynamical models. arXiv preprint arXiv:1502.02251, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[43] [43]

Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning

Thomas Bi and Raffaello D’Andrea. Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 7455–7460. IEEE, 2024

work page 2024

[44] [44]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Conference on Robot Learning (CoRL). PMLR, 2022

work page 2022

[45] [45]

Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

Christian Pfeiffer and Davide Scaramuzza. Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021

work page 2021

[46] [46]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Ma- chine Learning , volume 97 of Proceedings of Machine Learning Research, pages 2555–256...

work page

[47] [47]

URL https://proceedings.mlr.press/v97/hafner19a. html

work page

[48] [48]

Stable-baselines3: Reliable reinforcement learning im- plementations

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning im- plementations. Journal of Machine Learning Research , 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/ 20-1364.html

work page 2021

[49] [49]

Flightmare: A flexible quadrotor simulator

Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Lo- quercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. In Conference on Robot Learning , 2020

work page 2020

[50] [50]

Habitat: A platform for embodied ai research

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019

work page 2019

[51] [51]

Habitat 3.0: A co-habitat for humans, avatars, and robots

Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander Clegg, Michal Hlavac, So Yeon Min, Vladim ´ır V ondruˇs, Theophile Gervet, Vincent-Pierre Berges, John M Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Aksha...

work page 2024

[52] [52]

Autonomous drone racing with deep reinforcement learning

Yunlong Song, Mats Steinweg, Elia Kaufmann, and Da- vide Scaramuzza. Autonomous drone racing with deep reinforcement learning. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS) , 2021

work page 2021

[53] [53]

Lillicrap, Jimmy Ba, and Mo- hammad Norouzi

Danijar Hafner, Timothy P. Lillicrap, Jimmy Ba, and Mo- hammad Norouzi. Dream to control: Learning behaviors by latent imagination. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020

work page 2020