pith. sign in

arxiv: 1707.08475 · v2 · pith:K4U6FZBQnew · submitted 2017-07-26 · 📊 stat.ML · cs.AI· cs.LG

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

classification 📊 stat.ML cs.AIcs.LG
keywords darladomainlearningadaptationagentdatadisentangledmany
0
0 comments X
read the original abstract

Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA's vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts - even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

    cs.RO 2023-11 unverdicted novelty 7.0

    Temporal Transfer Learning selects source tasks for zero-shot transfer of RL policies to solve a range of coarse-grained advisory autonomy hold durations in traffic optimization more reliably than baselines.

  2. State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning

    cs.RO 2025-12 unverdicted novelty 5.0

    SCAL derives an upper bound on target-domain imitation loss using source loss plus state-conditional latent KL divergence and aligns distributions via a discriminator-based adversarial estimator.