Bellman calibration supplies a new reliability criterion and post-hoc recalibration method for value functions in offline RL, with finite-sample guarantees at one-dimensional nonparametric rates that avoid Bellman completeness and realizability assumptions.
On risk bounds in isotonic and other shape restricted regression problems
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
We consider the problem of estimating an unknown $\theta\in {\mathbb{R}}^n$ from noisy observations under the constraint that $\theta$ belongs to certain convex polyhedral cones in ${\mathbb{R}}^n$. Under this setting, we prove bounds for the risk of the least squares estimator (LSE). The obtained risk bound behaves differently depending on the true sequence $\theta$ which highlights the adaptive behavior of $\theta$. As special cases of our general result, we derive risk bounds for the LSE in univariate isotonic and convex regression. We study the risk bound in isotonic regression in greater detail: we show that the isotonic LSE converges at a whole range of rates from $\log n/n$ (when $\theta$ is constant) to $n^{-2/3}$ (when $\theta$ is uniformly increasing in a certain sense). We argue that the bound presents a benchmark for the risk of any estimator in isotonic regression by proving nonasymptotic local minimax lower bounds. We prove an analogue of our bound for model misspecification where the true $\theta$ is not necessarily nondecreasing.
verdicts
UNVERDICTED 2representative citing papers
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
citing papers explorer
-
Bellman Calibration for $V$-Learning in Offline Reinforcement Learning
Bellman calibration supplies a new reliability criterion and post-hoc recalibration method for value functions in offline RL, with finite-sample guarantees at one-dimensional nonparametric rates that avoid Bellman completeness and realizability assumptions.
-
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.