PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
InProceedings of the ACM Web Conference 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Offline Contextual Bandits in the Presence of New Actions
PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.