PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CASP selects lower-burden two-stage recommender policies by combining doubly robust estimation with a penalty for weak data support and provides theoretical guarantees for conservative selection.
citing papers explorer
-
Offline Contextual Bandits in the Presence of New Actions
PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
-
CASP: Support-Aware Offline Policy Selection for Two-Stage Recommender Systems
CASP selects lower-burden two-stage recommender policies by combining doubly robust estimation with a penalty for weak data support and provides theoretical guarantees for conservative selection.