Thompson Sampling is Asymptotically Optimal in General Environments

Jan Leike; Laurent Orseau; Marcus Hutter; Tor Lattimore

arxiv: 1602.07905 · v2 · pith:P7RIVRJ3new · submitted 2016-02-25 · 💻 cs.LG · cs.AI· stat.ML

Thompson Sampling is Asymptotically Optimal in General Environments

Jan Leike , Tor Lattimore , Laurent Orseau , Marcus Hutter This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords environmentssamplingthompsonasymptoticallygeneraloptimalvalueassumption

0 comments

read the original abstract

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Thompson Sampling for Infinite-Horizon Discounted Decision Processes
stat.ML 2024-05 unverdicted novelty 7.0

Extends Thompson sampling analysis to Borel MDPs via a three-term regret decomposition and shows exponential convergence of residual regret to zero under extended assumptions.