Risk-Aware Active Inverse Reinforcement Learning

· 2019 · cs.LG · arXiv 1901.02161

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stopping criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.

representative citing papers

Learning Reward Functions by Integrating Human Demonstrations and Preferences

cs.RO · 2019-06-21 · unverdicted · novelty 6.0

DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.

DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL

cs.RO · 2019-06-24 · unverdicted · novelty 5.0

DynoPlan adds dynamics models and a demonstration-derived heuristic to the options framework so that hierarchical RL can switch between motion planning and DNN controllers via short-horizon model-predictive evaluation.

citing papers explorer

Showing 2 of 2 citing papers.

Learning Reward Functions by Integrating Human Demonstrations and Preferences cs.RO · 2019-06-21 · unverdicted · none · ref 10 · internal anchor
DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.
DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL cs.RO · 2019-06-24 · unverdicted · none · ref 14 · internal anchor
DynoPlan adds dynamics models and a demonstration-derived heuristic to the options framework so that hierarchical RL can switch between motion planning and DNN controllers via short-horizon model-predictive evaluation.

Risk-Aware Active Inverse Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer