pith. machine review for the scientific record. sign in

arxiv: 1603.00448 · v3 · submitted 2016-03-01 · 💻 cs.LG · cs.AI· cs.RO

Recognition: unknown

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIcs.RO
keywords costcontrollearninginverseoptimaladdressbehaviorschallenge
0
0 comments X
read the original abstract

Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Concrete Problems in AI Safety

    cs.AI 2016-06 accept novelty 7.0

    The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.