PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
Unbiased offline evaluation of contextual- bandit-based news article recommendation algorithms
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
other 1
citation-polarity summary
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1roles
other 1polarities
unclear 1representative citing papers
citing papers explorer
-
PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.