Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

Finale Doshi-Velez; Jos\'e Miguel Hern\'andez-Lobato; Stefan Depeweg; Steffen Udluft

arxiv: 1605.07127 · v3 · pith:D5YYPRKPnew · submitted 2016-05-23 · 📊 stat.ML · cs.LG

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

Stefan Depeweg , Jos\'e Miguel Hern\'andez-Lobato , Finale Doshi-Velez , Steffen Udluft This is my paper

classification 📊 stat.ML cs.LG

keywords learningapproachesbayesianbnnsmodel-basednetworksneuralpolicy

0 comments

read the original abstract

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing $\alpha$-divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Divergence Measures for Training GFlowNets
cs.LG 2024-10 unverdicted novelty 6.0

Introduces statistically efficient estimators for Renyi-α, Tsallis-α, reverse and forward KL divergences with REINFORCE and score-matching control variates for faster GFlowNet training.
Uncertainty-aware Model-based Policy Optimization
cs.LG 2019-06 unverdicted novelty 5.0

Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sa...
Calibrated Model-Based Deep Reinforcement Learning
cs.LG 2019-06 unverdicted novelty 5.0

Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.