PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.
citing papers explorer
-
Offline Contextual Bandits in the Presence of New Actions
PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
-
Off-Policy Learning with Limited Supply
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.