Minimax regret bounds for reinforcement learning

Mohammad Gheshlaghi Azar, Ian Osband · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.

citing papers explorer

Showing 1 of 1 citing paper.

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration cs.LG · 2026-05-07 · unverdicted · none · ref 3
The authors derive a finite-sample adaptive-data performance bound for FQI by chaining measure-theoretic probability with Bellman contractions and prove the first cumulative pathwise online regret guarantee in continuous spaces using sequential Rademacher complexity.

Minimax regret bounds for reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer