pith. machine review for the scientific record. sign in

arxiv: 1611.04201 · v4 · submitted 2016-11-13 · 💻 cs.LG · cs.CV· cs.RO

Recognition: unknown

CAD2RL: Real Single-Image Flight without a Single Real Image

Authors on Pith no claims yet
classification 💻 cs.LG cs.CVcs.RO
keywords realflightimagespolicywithoutdeepentirelylearning
0
0 comments X
read the original abstract

Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: https://youtu.be/nXBWmzFrj5s

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  2. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.

  3. Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.