Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.

citing papers explorer

Showing 1 of 1 citing paper.

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration cs.LG · 2026-05-07 · unverdicted · none · ref 54
The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

fields

years

verdicts

representative citing papers

citing papers explorer