pith. sign in

arxiv: 1602.07905 · v2 · pith:P7RIVRJ3new · submitted 2016-02-25 · 💻 cs.LG · cs.AI· stat.ML

Thompson Sampling is Asymptotically Optimal in General Environments

classification 💻 cs.LG cs.AIstat.ML
keywords environmentssamplingthompsonasymptoticallygeneraloptimalvalueassumption
0
0 comments X
read the original abstract

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Thompson Sampling for Infinite-Horizon Discounted Decision Processes

    stat.ML 2024-05 unverdicted novelty 7.0

    Extends Thompson sampling analysis to Borel MDPs via a three-term regret decomposition and shows exponential convergence of residual regret to zero under extended assumptions.