pith. sign in

arxiv: 1507.00300 · v1 · pith:WYMNCCIPnew · submitted 2015-07-01 · 📊 stat.ML · cs.LG

Bootstrapped Thompson Sampling and Deep Exploration

classification 📊 stat.ML cs.LG
keywords approachexplorationsamplingthompsondeepdistributionlearningmaintaining
0
0 comments X
read the original abstract

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Diffusion Approximations for Thompson Sampling in the Small Gap Regime

    cs.LG 2021-05 unverdicted novelty 7.0

    In the small gap regime, Thompson sampling and a broad class of sampling-based algorithms converge weakly to identical SDE limits, making regret performance insensitive to likelihood misspecification.