Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Marc Peter Deisenroth; Sanket Kamthe

arxiv: 1706.06491 · v2 · pith:YC6MUKYXnew · submitted 2017-06-20 · 💻 cs.SY · stat.ML

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Sanket Kamthe , Marc Peter Deisenroth This is my paper

classification 💻 cs.SY stat.ML

keywords modelcontrolinteractionslong-termnumberprobabilisticconstraintslarge

0 comments

read the original abstract

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Benchmarking Model-Based Reinforcement Learning
cs.LG 2019-07 accept novelty 7.0

Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termin...
Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning
eess.SY 2019-06 unverdicted novelty 6.0

Develops a learning-based MPC algorithm that uses confidence intervals on trajectories and terminal set constraints to guarantee safety throughout RL exploration and training.
Uncertainty-aware Model-based Policy Optimization
cs.LG 2019-06 unverdicted novelty 5.0

Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sa...