DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
hub
arXiv preprint arXiv:1611.04201 , year=
10 Pith papers cite this work. Polarity classification is still indexing.
abstract
Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: https://youtu.be/nXBWmzFrj5s
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
DreamerV3 enables pixel-to-control policies for drone racing that reach 9 m/s in both simulation and real hardware-in-the-loop tests.
RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.
EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
NavRL++ improves sim-to-real transfer for RL navigation via empirical analysis of perturbations, perturbation-aware fine-tuning, and a Transformer temporal policy, with real-world validation showing outperformance over learning baselines and parity with optimization planners in static cases.
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
End-to-end multi-task regression learns flight commands for UAVs to explore unstructured forest environments from vision alone, outperforming pose-estimation baselines in simulation.
Imitation learning pretraining of a ResNet-34 DDPG agent improves performance on image-based autonomous driving in simulation over pure IL or pure RL.
citing papers explorer
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
-
Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight
DreamerV3 enables pixel-to-control policies for drone racing that reach 9 m/s in both simulation and real hardware-in-the-loop tests.
-
RoboNet: Large-Scale Multi-Robot Learning
RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.
-
Environment Probing Interaction Policies
EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.
-
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
-
NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation
NavRL++ improves sim-to-real transfer for RL navigation via empirical analysis of perturbations, perturbation-aware fine-tuning, and a Transformer temporal policy, with real-world validation showing outperformance over learning baselines and parity with optimization planners in static cases.
-
Agent AI: Surveying the Horizons of Multimodal Interaction
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
-
Multi-Task Regression-based Learning for Autonomous Unmanned Aerial Vehicle Flight Control within Unstructured Outdoor Environments
End-to-end multi-task regression learns flight commands for UAVs to explore unstructured forest environments from vision alone, outperforming pose-estimation baselines in simulation.
-
Improved Reinforcement Learning through Imitation Learning Pretraining Towards Image-based Autonomous Driving
Imitation learning pretraining of a ResNet-34 DDPG agent improves performance on image-based autonomous driving in simulation over pure IL or pure RL.