{"work":{"id":"52f29e55-b23c-42e4-8187-a5ea46200858","openalex_id":null,"doi":null,"arxiv_id":"1611.04201","raw_key":null,"title":"CAD2RL: Real Single-Image Flight without a Single Real Image","authors":null,"authors_text":"Cad2rl: Real single-image flight without a single real image , author=","year":2016,"venue":"cs.LG","abstract":"Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: https://youtu.be/nXBWmzFrj5s","external_url":"https://arxiv.org/abs/1611.04201","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-24T21:09:56.801510+00:00","pith_arxiv_id":"1611.04201","created_at":"2026-05-09T22:44:15.114795+00:00","updated_at":"2026-05-24T21:09:56.801510+00:00","title_quality_ok":true,"display_title":"arXiv preprint arXiv:1611.04201 , year=","render_title":"arXiv preprint arXiv:1611.04201 , year="},"hub":{"state":{"work_id":"52f29e55-b23c-42e4-8187-a5ea46200858","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":10,"external_cited_by_count":null,"distinct_field_count":4,"first_pith_cited_at":"2019-07-16T04:48:52+00:00","last_pith_cited_at":"2026-05-15T02:58:25+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-25T03:45:30.653382+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":1}],"polarity_counts":[{"context_polarity":"background","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}