An Information-Theoretic Analysis of Thompson Sampling

Benjamin Van Roy; Daniel Russo

arxiv: 1403.5341 · v2 · pith:7RS33G7Lnew · submitted 2014-03-21 · 💻 cs.LG

An Information-Theoretic Analysis of Thompson Sampling

Daniel Russo , Benjamin Van Roy This is my paper

classification 💻 cs.LG

keywords analysisinformationinformation-theoreticsamplingthompsonacrossappliesbounds

0 comments

read the original abstract

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Bayesian policy gradient and actor-critic algorithms
cs.LG 2026-04 unverdicted novelty 6.0

Bayesian modeling of policy gradients as Gaussian processes and actor-critic variants reduce sample needs and provide uncertainty estimates compared to Monte-Carlo methods.