pith. sign in

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing $\alpha$-divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

fields

cs.LG 3

years

2024 1 2019 2

verdicts

UNVERDICTED 3

representative citing papers

On Divergence Measures for Training GFlowNets

cs.LG · 2024-10-12 · unverdicted · novelty 6.0

Introduces statistically efficient estimators for Renyi-α, Tsallis-α, reverse and forward KL divergences with REINFORCE and score-matching control variates for faster GFlowNet training.

Uncertainty-aware Model-based Policy Optimization

cs.LG · 2019-06-25 · unverdicted · novelty 5.0

Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.

Calibrated Model-Based Deep Reinforcement Learning

cs.LG · 2019-06-19 · unverdicted · novelty 5.0

Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.

citing papers explorer

Showing 3 of 3 citing papers.

  • On Divergence Measures for Training GFlowNets cs.LG · 2024-10-12 · unverdicted · none · ref 18 · internal anchor

    Introduces statistically efficient estimators for Renyi-α, Tsallis-α, reverse and forward KL divergences with REINFORCE and score-matching control variates for faster GFlowNet training.

  • Uncertainty-aware Model-based Policy Optimization cs.LG · 2019-06-25 · unverdicted · none · ref 7 · internal anchor

    Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.

  • Calibrated Model-Based Deep Reinforcement Learning cs.LG · 2019-06-19 · unverdicted · none · ref 10 · internal anchor

    Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.