The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.
Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.