Randomized Prior Functions for Deep Reinforcement Learning

Albin Cassirer; Ian Osband; John Aslanides

arxiv: 1806.03335 · v2 · pith:ZDRB6FZBnew · submitted 2018-06-08 · 📊 stat.ML · cs.AI· cs.LG

Randomized Prior Functions for Deep Reinforcement Learning

Ian Osband , John Aslanides , Albin Cassirer This is my paper

classification 📊 stat.ML cs.AIcs.LG

keywords learninguncertaintyapproachdeepefficientpriorproblemsrandomized

0 comments

read the original abstract

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Propagating data noise through the fit: the Monte Carlo replica distribution
hep-ph 2026-06 unverdicted novelty 7.0

Derives that the MC replica method produces a distribution differing from the Bayesian Laplace approximation by a single computable matrix (residual-weighted Hessian), whose sign and magnitude determine over- or under...