PONA integrates the LCPI estimator for new action selection with the DR estimator for existing actions to optimize policies in offline contextual bandits with evolving action spaces.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
dataset 2polarities
use dataset 2representative citing papers
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
DPAA mitigates popularity bias in GNN-based collaborative filtering by integrating adaptive embedding-aware interaction weighting stabilized from pre-trained embeddings and layer-wise amplification of higher-order neighborhoods, outperforming prior debiasing methods on real and semi-synthetic data.
OPLS is a new off-policy learning method for contextual bandits with limited supply that outperforms conventional greedy approaches by prioritizing items with relatively higher expected rewards compared to other users.
citing papers explorer
-
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
-
Debiasing Message Passing to Mitigate Popularity Bias in GNN-based Collaborative Filtering
DPAA mitigates popularity bias in GNN-based collaborative filtering by integrating adaptive embedding-aware interaction weighting stabilized from pre-trained embeddings and layer-wise amplification of higher-order neighborhoods, outperforming prior debiasing methods on real and semi-synthetic data.