Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termination dilemma.
Guided cost learning: Deep inverse optimal control via policy optimization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2019 2representative citing papers
DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.
citing papers explorer
-
Benchmarking Model-Based Reinforcement Learning
Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termination dilemma.
-
Learning Reward Functions by Integrating Human Demonstrations and Preferences
DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.