pith. machine review for the scientific record. sign in

arxiv: 1606.03476 · v1 · submitted 2016-06-10 · 💻 cs.LG · cs.AI

Recognition: unknown

Generative Adversarial Imitation Learning

Authors on Pith no claims yet
classification 💻 cs.LG cs.AI
keywords learningreinforcementexpertimitationpolicyadversarialapproachcost
0
0 comments X
read the original abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

    cs.RO 2026-04 unverdicted novelty 6.0

    A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

  2. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  3. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    cs.RO 2021-09 accept novelty 6.0

    A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.

  4. Real-Time Evaluation of Autonomous Systems under Adversarial Attacks

    cs.AI 2026-05 unverdicted novelty 4.0

    A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...