Thresholding Bandit with Optimal Aggregate Regret

Chao Tao; Jian Peng; Sa\`ul Blanco; Yuan Zhou

arxiv: 1905.11046 · v1 · pith:PEO6AXFDnew · submitted 2019-05-27 · 💻 cs.LG · stat.ML

Thresholding Bandit with Optimal Aggregate Regret

Chao Tao , Sa\`ul Blanco , Jian Peng , Yuan Zhou This is my paper

classification 💻 cs.LG stat.ML

keywords algorithmaggregatearmsbanditoptimalregretthresholdingabove

0 comments

read the original abstract

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

This paper has not been read by Pith yet.

Thresholding Bandit with Optimal Aggregate Regret

discussion (0)