OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.
InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Off-Policy Learning with Limited Supply
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.