Generative Adversarial Imitation Learning

Jonathan Ho; Stefano Ermon

arxiv: 1606.03476 · v1 · pith:N4WX3NJJnew · submitted 2016-06-10 · 💻 cs.LG · cs.AI

Generative Adversarial Imitation Learning

Jonathan Ho , Stefano Ermon This is my paper

classification 💻 cs.LG cs.AI

keywords learningreinforcementexpertimitationpolicyadversarialapproachcost

0 comments

read the original abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation
cs.RO 2026-05 unverdicted novelty 7.0

CoRMA enables within-episode adaptation for contact-rich robotic assembly by inferring semantic contact context with a causal Transformer and force-regime contrastive objective, retaining higher real success than FORG...
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control
cs.RO 2026-04 unverdicted novelty 6.0

A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
cs.RO 2024-02 conditional novelty 6.0

3D Diffuser Actor unifies diffusion policies with 3D scene features to set new state-of-the-art results on RLBench and CALVIN robot benchmarks.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
cs.RO 2021-09 accept novelty 6.0

A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
cs.RO 2025-10 unverdicted novelty 5.0

Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real r...
Real-Time Evaluation of Autonomous Systems under Adversarial Attacks
cs.AI 2026-05 unverdicted novelty 4.0

A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...