Generative Adversarial Imitation Learning

Jonathan Ho , Stefano Ermon

Authors on Pith no claims yet

classification 💻 cs.LG cs.AI

keywords learningreinforcementexpertimitationpolicyadversarialapproachcost

read the original abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control
cs.RO 2026-04 unverdicted novelty 6.0

A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
cs.RO 2021-09 accept novelty 6.0

A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.
Real-Time Evaluation of Autonomous Systems under Adversarial Attacks
cs.AI 2026-05 unverdicted novelty 4.0

A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...