pith. sign in

arxiv: 1611.04201 · v4 · pith:7GNXBUFHnew · submitted 2016-11-13 · 💻 cs.LG · cs.CV· cs.RO

CAD2RL: Real Single-Image Flight without a Single Real Image

classification 💻 cs.LG cs.CVcs.RO
keywords realflightimagespolicywithoutdeepentirelylearning
0
0 comments X
read the original abstract

Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: https://youtu.be/nXBWmzFrj5s

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  2. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.

  3. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

  4. Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.

  5. TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

    cs.MA 2026-02 unverdicted novelty 6.0

    Presents TABX, a modular JAX-accelerated sandbox simulator enabling customizable multi-agent tasks and high-throughput evaluation for cooperative MARL.

  6. Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

    cs.RO 2025-01 unverdicted novelty 6.0

    DreamerV3 enables pixel-to-control policies for drone racing that reach 9 m/s in both simulation and real hardware-in-the-loop tests.

  7. RoboNet: Large-Scale Multi-Robot Learning

    cs.RO 2019-10 conditional novelty 6.0

    RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.

  8. Environment Probing Interaction Policies

    cs.RO 2019-07 unverdicted novelty 6.0

    EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.

  9. NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation

    cs.RO 2026-05 unverdicted novelty 5.0

    NavRL++ improves sim-to-real transfer for RL navigation via empirical analysis of perturbations, perturbation-aware fine-tuning, and a Transformer temporal policy, with real-world validation showing outperformance ove...

  10. Agent AI: Surveying the Horizons of Multimodal Interaction

    cs.AI 2024-01 unverdicted novelty 4.0

    The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

  11. Multi-Task Regression-based Learning for Autonomous Unmanned Aerial Vehicle Flight Control within Unstructured Outdoor Environments

    cs.RO 2019-07 unverdicted novelty 4.0

    End-to-end multi-task regression learns flight commands for UAVs to explore unstructured forest environments from vision alone, outperforming pose-estimation baselines in simulation.

  12. Improved Reinforcement Learning through Imitation Learning Pretraining Towards Image-based Autonomous Driving

    cs.LG 2019-07 unverdicted novelty 3.0

    Imitation learning pretraining of a ResNet-34 DDPG agent improves performance on image-based autonomous driving in simulation over pure IL or pure RL.