pith. sign in

arxiv: 1809.05214 · v1 · pith:4AW2EDTEnew · submitted 2018-09-14 · 💻 cs.LG · cs.AI· stat.ML

Model-Based Reinforcement Learning via Meta-Policy Optimization

classification 💻 cs.LG cs.AIstat.ML
keywords dynamicsmodel-basedensemblelearningmb-mpomodelmodelsapproach
0
0 comments X
read the original abstract

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Benchmarking Model-Based Reinforcement Learning

    cs.LG 2019-07 accept novelty 7.0

    Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termin...

  2. Uncertainty-aware Model-based Policy Optimization

    cs.LG 2019-06 unverdicted novelty 5.0

    Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sa...

  3. Calibrated Model-Based Deep Reinforcement Learning

    cs.LG 2019-06 unverdicted novelty 5.0

    Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.