PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
dataset 2polarities
use dataset 2representative citing papers
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
DPAA mitigates popularity bias in GNN-based collaborative filtering by integrating adaptive embedding-aware interaction weighting stabilized from pre-trained embeddings and layer-wise amplification of higher-order neighborhoods, outperforming prior debiasing methods on real and semi-synthetic data.
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.
citing papers explorer
-
Offline Contextual Bandits in the Presence of New Actions
PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
-
Off-Policy Learning with Limited Supply
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.