Gradient Ascent for Active Exploration in Bandit Problems

Pierre M\'enard

arxiv: 1905.08165 · v1 · pith:JXUW6ZKLnew · submitted 2019-05-20 · 📊 stat.ML · cs.LG

Gradient Ascent for Active Exploration in Bandit Problems

Pierre M\'enard This is my paper

classification 📊 stat.ML cs.LG

keywords ascentactivealgorithmbanditexplorationgradientproblemproblems

0 comments

read the original abstract

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context
cs.LG 2025-02 unverdicted novelty 6.0

Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.