pith. sign in

arxiv: 1606.03476 · v1 · pith:N4WX3NJJnew · submitted 2016-06-10 · 💻 cs.LG · cs.AI

Generative Adversarial Imitation Learning

classification 💻 cs.LG cs.AI
keywords learningreinforcementexpertimitationpolicyadversarialapproachcost
0
0 comments X
read the original abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation

    cs.RO 2026-05 unverdicted novelty 7.0

    CoRMA enables within-episode adaptation for contact-rich robotic assembly by inferring semantic contact context with a causal Transformer and force-regime contrastive objective, retaining higher real success than FORG...

  2. Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

    cs.RO 2026-04 unverdicted novelty 6.0

    A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

  3. 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

    cs.RO 2024-02 conditional novelty 6.0

    3D Diffuser Actor unifies diffusion policies with 3D scene features to set new state-of-the-art results on RLBench and CALVIN robot benchmarks.

  4. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  5. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    cs.RO 2021-09 accept novelty 6.0

    A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.

  6. When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

    cs.RO 2025-10 unverdicted novelty 5.0

    Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real r...

  7. Real-Time Evaluation of Autonomous Systems under Adversarial Attacks

    cs.AI 2026-05 unverdicted novelty 4.0

    A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...