An Information-Theoretic Analysis of Thompson Sampling
classification
💻 cs.LG
keywords
analysisinformationinformation-theoreticsamplingthompsonacrossappliesbounds
read the original abstract
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Bayesian policy gradient and actor-critic algorithms
Bayesian modeling of policy gradients as Gaussian processes and actor-critic variants reduce sample needs and provide uncertainty estimates compared to Monte-Carlo methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.