Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight
Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3
The pith
Model-based reinforcement learning trains drone policies that fly race tracks from camera pixels alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DreamerV3 trains visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods, this approach acquires drone racing skills from pixels. A perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without handcrafted reward terms. Experiments in simulation and real-world flight with a hardware-in-the-loop setup demonstrate deployment on real quadrotors at speeds of up to 9 m/s.
What carries the argument
DreamerV3, the model-based reinforcement learning method that builds an internal world model from pixel observations to plan sequences of control commands.
If this is right
- Visuomotor policies for drone racing can be learned without intermediate representations or heavy imitation learning bootstrapping.
- Perception-aware camera steering arises automatically from the model-based training process.
- Real-world deployment reaches speeds of 9 m/s on physical quadrotors.
- Model-based methods provide a sample-efficient route for pixel-to-command control where model-free methods fail.
Where Pith is reading between the lines
- The same model-learning approach could extend to other vision-only robotic tasks that currently require extensive real-world data collection.
- Improving the visual realism of the simulator might allow fully zero-shot transfer without the hardware-in-the-loop step.
- Emergent behaviors without explicit rewards point to possible discovery of useful strategies in related control problems such as navigation through cluttered spaces.
Load-bearing premise
Rendered images supplied during hardware-in-the-loop testing are close enough to real camera images and quadrotor dynamics that policies transfer without further adjustment.
What would settle it
Running the learned policy on the physical quadrotor while feeding it live camera images instead of rendered ones and checking whether it still completes the track at speeds near 9 m/s.
Figures
read the original abstract
Autonomous drone racing has risen as a challenging robotic benchmark for testing the limits of learning, perception, planning, and control. Expert human pilots are able to fly a drone through a race track by mapping pixels from a single camera directly to control commands. Recent works in autonomous drone racing attempting direct pixel-to-commands control policies have relied on either intermediate representations that simplify the observation space or performed extensive bootstrapping using Imitation Learning (IL). This paper leverages DreamerV3 to train visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods like PPO or SAC, which are sample-inefficient and struggle in this setting, our approach acquires drone racing skills from pixels. Notably, a perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without the need of handcrafted reward terms for the viewing direction. Our experiments show in both, simulation and real-world flight using a hardware-in-the-loop setup with rendered image observations, how the proposed approach can be deployed on real quadrotors at speeds of up to 9 m/s. These results advance the state of pixel-based autonomous flight and demonstrate that MBRL offers a promising path for real-world robotics research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that DreamerV3 enables training of visuomotor policies for autonomous drone racing directly from pixel observations, achieving agile flight through a racetrack with emergent perception-aware camera steering (without handcrafted rewards for viewing direction). It reports that the approach outperforms model-free methods like PPO and SAC, and can be deployed on real quadrotors at speeds up to 9 m/s via hardware-in-the-loop (HIL) experiments using rendered image observations in both simulation and real-world flight.
Significance. If the central claims hold after addressing the transfer gap, the work would show that model-based RL can handle high-speed pixel-to-command control in robotics without imitation learning or intermediate representations, advancing pixel-based autonomous flight. The reported emergence of perception-aware behavior is a strength worth highlighting, as it arises without explicit reward engineering.
major comments (2)
- [Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.
- [Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.
minor comments (1)
- [Abstract] Abstract: The phrasing 'in both, simulation and real-world flight' contains a comma splice and should be revised for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments correctly identify that our hardware-in-the-loop (HIL) results use rendered observations and that the abstract would benefit from additional quantitative detail. We respond point-by-point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of real-world deployment on quadrotors at up to 9 m/s rests on HIL experiments that feed rendered image observations to the policy. No ablation, direct comparison, or quantitative evidence is provided showing that these rendered images capture the statistics of real camera noise, lens distortion, lighting, or texture under flight conditions, leaving the sim-to-real perceptual transfer untested. This assumption is load-bearing for the real-world result.
Authors: We agree that the HIL experiments employ rendered image observations and that no ablations or direct comparisons are provided to demonstrate equivalence with real camera statistics (noise, distortion, lighting, texture). This leaves the perceptual component of sim-to-real transfer untested, which is a substantive limitation. In the revision we will explicitly state in the abstract and method sections that the real-world results are HIL with rendered observations, and we will add a dedicated limitations paragraph discussing the untested perceptual transfer. We maintain that the HIL setup still offers meaningful validation by exercising the policy on physical quadrotor dynamics at 9 m/s, but we do not claim full perceptual realism. revision: partial
-
Referee: [Abstract] Abstract / Experiments: The abstract asserts successful deployment and superiority over PPO/SAC but supplies no quantitative metrics (e.g., success rates, lap times, or failure modes), ablation studies, or baseline comparisons for the real-world HIL flights. Without these, the soundness of the 9 m/s claim and the MBRL advantage cannot be evaluated.
Authors: The full manuscript reports simulation metrics and PPO/SAC comparisons; the abstract summarizes the 9 m/s HIL speed but omits per-experiment numbers. We will revise the abstract to include key HIL metrics (success rate, lap time, failure modes) and ensure the experiments section supplies the corresponding quantitative tables and any available baseline comparisons for the HIL condition, allowing direct evaluation of the claims. revision: yes
Circularity Check
No circularity; results rest on reported experiments
full rationale
The paper applies an existing MBRL algorithm (DreamerV3) to a visuomotor drone-racing task and supports its claims with simulation and hardware-in-the-loop experiments. No derivation, prediction, or uniqueness claim reduces by construction to fitted parameters, self-citations, or definitional equivalence. The central result (pixel-to-command policies achieving 9 m/s in HIL) is presented as an empirical outcome rather than an input presupposed by the method.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
leverages DreamerV3 to train visuomotor policies... world model... RSSM... reward function... progress term b1(∥gk−pk−1∥−∥gk−pk∥)−b2∥ωk∥
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
emergent perception-aware behaviour... no handcrafted reward terms for viewing direction
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Au- tonomous drone racing: A survey
Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, and Davide Scaramuzza. Au- tonomous drone racing: A survey. IEEE Transactions on Robotics, 2024
work page 2024
-
[2]
Challenges and implemented technologies used in autonomous drone racing
Hyungpil Moon, Jose Martinez-Carranza, Titus Cieslewski, Matthias Faessler, Davide Falanga, Alessandro Simovic, Davide Scaramuzza, Shuo Li, Michael Ozo, Christophe De Wagter, et al. Challenges and implemented technologies used in autonomous drone racing. Intelligent Service Robotics , 2019. 3https://github.com/danijar/dreamerv3
work page 2019
-
[3]
Champion-level drone racing using deep rein- forcement learning
Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning. Nature, 620(7976):982–987, 2023
work page 2023
-
[4]
Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning
Yunlong Song, Angel Romero, Matthias M ¨uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforce- ment learning. Science Robotics, 8(82):eadg1462, 2023
work page 2023
-
[5]
A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge
Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing- based framework for the 2016 iros autonomous drone racing challenge. Journal of Field Robotics , 35(1):146– 166, 2018
work page 2016
-
[6]
Deep drone racing: Learning agile flight in dynamic environments
Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. Deep drone racing: Learning agile flight in dynamic environments. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Con- ference on Robot Learning , volume 87 of Proceedings of Machine Learning Research , page...
work page 2018
-
[7]
The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition
Christophe De Wagter, Federico Paredes-Vall ´es, Nilay Sheth, and Guido de Croon. The artificial intelligence behind the winning entry to the 2019 ai robotic racing competition. arXiv preprint arXiv:2109.14985 , 2021
-
[8]
P. Foehn, D. Brescianini, E. Kaufmann, T. Cieslewski, M. Gehrig, M. Muglikar, and D. Scaramuzza. Alphapilot: Autonomous drone racing. Robotics: Science and Sys- tems (RSS), 2020. URL https://link.springer.com/article/ 10.1007/s11370-018-00271-6
-
[9]
Time-optimal planning for quadrotor waypoint flight
Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6(56):eabh1221, 2021
work page 2021
-
[10]
Multi-task reinforcement learning for quadrotors
Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljal- bout, and Davide Scaramuzza. Multi-task reinforcement learning for quadrotors. IEEE Robotics and Automation Letters, 2024
work page 2024
-
[11]
Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing
Angel Romero, Elie Aljalbout, Yunlong Song, and Da- vide Scaramuzza. Actor-critic model predictive control: Differentiable optimization meets reinforcement learn- ing. arXiv preprint arXiv:2306.09852 , 2024. URL https://arxiv.org/abs/2306.09852
-
[12]
Demonstrating agile flight from pixels without state estimation
Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation. Robotics: Science and Systems , 2024
work page 2024
-
[13]
Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight
Jiaxu Xing, Angel Romero, Leonard Bauersfeld, and Davide Scaramuzza. Bootstrapping reinforcement learn- ing with imitation for vision-based agile flight. 8th Conference on Robot Learning (CoRL) , 2024
work page 2024
-
[14]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timo- thy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757
work page 2013
-
[17]
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku- maran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcemen...
work page 2015
-
[18]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel- vam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the...
work page 2016
-
[19]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. DeepMind Control Suite, January 2018. URL http://arxiv.org/abs/1801. 00690. arXiv:1801.00690 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Improving sample efficiency in model-free reinforcement learning from images
Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10674–10681, May 2021. ISSN 2374-3468, 2159-5399
work page 2021
-
[21]
Learning vision-based reactive policies for obstacle avoidance
Elie Aljalbout, Ji Chen, Konstantin Ritt, Maximilian Ul- mer, and Sami Haddadin. Learning vision-based reactive policies for obstacle avoidance. In Conference on Robot Learning, pages 2040–2054. PMLR, 2021
work page 2040
-
[22]
Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson
Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , volume 97 of...
work page 2019
-
[23]
CURL: Contrastive unsupervised representations for re- inforcement learning
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. CURL: Contrastive unsupervised representations for re- inforcement learning. In Hal Daum ´e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 5639–5650. PMLR, 13–18 Jul 2020
work page 2020
-
[24]
Reinforcement learning with augmented data
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Bal- can, and Hsuan-Tien Lin, editors, Advances in Neural In- formation Processing Systems 33: Annual Conference on Neural Information Processing System...
work page 2020
-
[25]
Mastering visual continuous control: Improved data-augmented reinforcement learning
Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenRe- view.net, 2022
work page 2022
-
[26]
Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell
Andrei A. Rusu, Matej Vecer´ık, Thomas Roth¨orl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 2017
work page 2017
-
[27]
Domain ran- domization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, September 24-28, 2017, pages 23–30. IEEE, 2017
work page 2017
-
[28]
End-to-end training of deep visuomotor policies
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. , 17:39:1–39:40, 2016
work page 2016
-
[29]
Zipeng Fu, Tony Z. Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation. In arXiv, 2024
work page 2024
-
[30]
Dronet: Learning to fly by driving
Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters , 3(2):1088–1095, 2018
work page 2018
-
[31]
GNM: A General Navigation Model to Drive Any Robot
Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hi- rose, and Sergey Levine. GNM: A General Navigation Model to Drive Any Robot. In International Confer- ence on Robotics and Automation (ICRA) , 2023. URL https://arxiv.org/abs/2210.03370
-
[32]
ViNT: A foundation model for visual navigation
Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. In 7th Annual Conference on Robot Learning, 2023. URL https://arxiv.org/abs/2306.14846
-
[33]
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. arXiv pre-print , 2023. URL https://arxiv.org/abs/2310.07896
-
[34]
Elia Kaufmann, Antonio Loquercio, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Deep drone acrobatics. In Proceedings of Robotics: Science and Systems , Corvalis, Oregon, USA, July 2020
work page 2020
-
[35]
Reinforcement learning for uav attitude control
William Koch, Renato Mancuso, Richard West, and Azer Bestavros. Reinforcement learning for uav attitude control. ACM Transactions on Cyber-Physical Systems , 3(2):1–21, 2019
work page 2019
-
[36]
Low-level control of a quadrotor with deep model-based reinforcement learning
Nathan O Lambert, Daniel S Drew, Joseph Yaconelli, Sergey Levine, Roberto Calandra, and Kristofer SJ Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224–4230, 2019
work page 2019
-
[37]
End-to-end reinforcement learn- ing for time-optimal quadcopter flight
Robin Ferede, Christophe De Wagter, Dario Izzo, and Guido CHE de Croon. End-to-end reinforcement learn- ing for time-optimal quadcopter flight. arXiv preprint arXiv:2311.16948, 2023
-
[38]
Jonas Eschmann, Dario Albani, and Giuseppe Loianno. Learning to fly in seconds. arXiv e-prints, pages arXiv– 2311, 2023
work page 2023
-
[39]
CAD2RL: Real Single-Image Flight without a Single Real Image
Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[40]
Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight
Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laen- gle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, et al. Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight. Science Robotics , 7 (67):eabl6259, 2022
work page 2022
-
[41]
Learning to fly via deep model-based reinforcement learning
Philip Becker-Ehmck, Maximilian Karl, Jan Peters, and Patrick van der Smagt. Learning to fly via deep model-based reinforcement learning. arXiv preprint arXiv:2003.08876, 2020
-
[42]
From Pixels to Torques: Policy Learning with Deep Dynamical Models
Niklas Wahlstr ¨om, Thomas B Sch ¨on, and Marc Pe- ter Deisenroth. From pixels to torques: Policy learn- ing with deep dynamical models. arXiv preprint arXiv:1502.02251, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[43]
Thomas Bi and Raffaello D’Andrea. Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 7455–7460. IEEE, 2024
work page 2024
-
[44]
Daydreamer: World models for physical robot learning
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Conference on Robot Learning (CoRL). PMLR, 2022
work page 2022
-
[45]
Christian Pfeiffer and Davide Scaramuzza. Human- piloted drone racing: Visual processing and control.IEEE Robotics and Automation Letters, 6(2):3467–3474, 2021
work page 2021
-
[46]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Ma- chine Learning , volume 97 of Proceedings of Machine Learning Research, pages 2555–256...
-
[47]
URL https://proceedings.mlr.press/v97/hafner19a. html
-
[48]
Stable-baselines3: Reliable reinforcement learning im- plementations
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning im- plementations. Journal of Machine Learning Research , 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/ 20-1364.html
work page 2021
-
[49]
Flightmare: A flexible quadrotor simulator
Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Lo- quercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. In Conference on Robot Learning , 2020
work page 2020
-
[50]
Habitat: A platform for embodied ai research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019
work page 2019
-
[51]
Habitat 3.0: A co-habitat for humans, avatars, and robots
Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander Clegg, Michal Hlavac, So Yeon Min, Vladim ´ır V ondruˇs, Theophile Gervet, Vincent-Pierre Berges, John M Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Aksha...
work page 2024
-
[52]
Autonomous drone racing with deep reinforcement learning
Yunlong Song, Mats Steinweg, Elia Kaufmann, and Da- vide Scaramuzza. Autonomous drone racing with deep reinforcement learning. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS) , 2021
work page 2021
-
[53]
Lillicrap, Jimmy Ba, and Mo- hammad Norouzi
Danijar Hafner, Timothy P. Lillicrap, Jimmy Ba, and Mo- hammad Norouzi. Dream to control: Learning behaviors by latent imagination. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.