Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.
Off-policy estimation with adaptively collected data: the power of online learning , isbn =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ME 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Anytime-valid Optimal Policy Identification
Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.