Thompson Sampling is Asymptotically Optimal in General Environments

· 2016 · cs.LG · arXiv 1602.07905

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

representative citing papers

Thompson Sampling for Infinite-Horizon Discounted Decision Processes

stat.ML · 2024-05-14 · unverdicted · novelty 7.0

Extends Thompson sampling analysis to Borel MDPs via a three-term regret decomposition and shows exponential convergence of residual regret to zero under extended assumptions.

citing papers explorer

Showing 1 of 1 citing paper.

Thompson Sampling for Infinite-Horizon Discounted Decision Processes stat.ML · 2024-05-14 · unverdicted · none · ref 14 · internal anchor
Extends Thompson sampling analysis to Borel MDPs via a three-term regret decomposition and shows exponential convergence of residual regret to zero under extended assumptions.

Thompson Sampling is Asymptotically Optimal in General Environments

fields

years

verdicts

representative citing papers

citing papers explorer