Bootstrapped Thompson Sampling and Deep Exploration

· 2015 · stat.ML · arXiv 1507.00300

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

representative citing papers

Diffusion Approximations for Thompson Sampling in the Small Gap Regime

cs.LG · 2021-05-19 · unverdicted · novelty 7.0

In the small gap regime, Thompson sampling and a broad class of sampling-based algorithms converge weakly to identical SDE limits, making regret performance insensitive to likelihood misspecification.

citing papers explorer

Showing 1 of 1 citing paper.

Diffusion Approximations for Thompson Sampling in the Small Gap Regime cs.LG · 2021-05-19 · unverdicted · none · ref 40 · internal anchor
In the small gap regime, Thompson sampling and a broad class of sampling-based algorithms converge weakly to identical SDE limits, making regret performance insensitive to likelihood misspecification.

Bootstrapped Thompson Sampling and Deep Exploration

fields

years

verdicts

representative citing papers

citing papers explorer