pith. sign in

arxiv: 1901.02161 · v2 · pith:SGMQAHDOnew · submitted 2019-01-08 · 💻 cs.LG · stat.ML

Risk-Aware Active Inverse Reinforcement Learning

classification 💻 cs.LG stat.ML
keywords activelearninginversereinforcementrisk-awarerobotallowsperformance
0
0 comments X
read the original abstract

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stopping criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Reward Functions by Integrating Human Demonstrations and Preferences

    cs.RO 2019-06 unverdicted novelty 6.0

    DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.

  2. DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL

    cs.RO 2019-06 unverdicted novelty 5.0

    DynoPlan adds dynamics models and a demonstration-derived heuristic to the options framework so that hierarchical RL can switch between motion planning and DNN controllers via short-horizon model-predictive evaluation.