Bayesian Reinforcement Learning in Factored POMDPs

Christopher Amato; Frans Oliehoek; Sammie Katt

arxiv: 1811.05612 · v1 · pith:UDZB46TMnew · submitted 2018-11-14 · 💻 cs.AI

Bayesian Reinforcement Learning in Factored POMDPs

Sammie Katt , Frans Oliehoek , Christopher Amato This is my paper

classification 💻 cs.AI

keywords ablelearningmethodmodelapproachesbayesianfactoredfactorization

0 comments

read the original abstract

Bayesian approaches provide a principled solution to the exploration-exploitation trade-off in Reinforcement Learning. Typical approaches, however, either assume a fully observable environment or scale poorly. This work introduces the Factored Bayes-Adaptive POMDP model, a framework that is able to exploit the underlying structure while learning the dynamics in partially observable systems. We also present a belief tracking method to approximate the joint posterior over state and model variables, and an adaptation of the Monte-Carlo Tree Search solution method, which together are capable of solving the underlying problem near-optimally. Our method is able to learn efficiently given a known factorization or also learn the factorization and the model parameters at the same time. We demonstrate that this approach is able to outperform current methods and tackle problems that were previously infeasible.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Expected Free Energy-based Planning as Variational Inference
cs.AI 2026-06 unverdicted novelty 7.0

EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
What Type of Inference is Active Inference?
cs.AI 2026-06 unverdicted novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.