Bellman calibration supplies a new reliability criterion and post-hoc recalibration method for value functions in offline RL, with finite-sample guarantees at one-dimensional nonparametric rates that avoid Bellman completeness and realizability assumptions.
Then there exists a universal constant C > 0such that, for allu≥1, with probability at least1−e −u2 , everyf∈ Fsatisfies 1 n nX i=1 f(O i)−E[f(O i)] ≤C δ2 +δ∥f∥+ u∥f∥√n + M u2 n
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bellman Calibration for $V$-Learning in Offline Reinforcement Learning
Bellman calibration supplies a new reliability criterion and post-hoc recalibration method for value functions in offline RL, with finite-sample guarantees at one-dimensional nonparametric rates that avoid Bellman completeness and realizability assumptions.