Reducing Exploration of Dying Arms in Mortal Bandits

Cynthia Rudin; Stefano Trac\`a; Weiyu Yan

arxiv: 1907.02571 · v1 · pith:ZYJNRSK7new · submitted 2019-07-04 · 📊 stat.ML · cs.LG

Reducing Exploration of Dying Arms in Mortal Bandits

Stefano Trac\`a , Cynthia Rudin , Weiyu Yan This is my paper

classification 📊 stat.ML cs.LG

keywords armsdisappearexplorationwhenapplicationsbanditsmortalperformance

0 comments

read the original abstract

Mortal bandits have proven to be extremely useful for providing news article recommendations, running automated online advertising campaigns, and for other applications where the set of available options changes over time. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. We provide adaptations of algorithms, regret bounds, and experiments for this study, showing a clear benefit from regulating greed (exploration/exploitation) for arms that will soon disappear. We illustrate numerical performance on the Yahoo! Front Page Today Module User Click Log Dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
cs.LG 2025-07 unverdicted novelty 5.0

Greedy linear models without exploration consistently achieve top-tier performance in over 90% of offline dataset evaluations for linear bandit recommenders, with hyperparameter tuning favoring minimal exploration and...