pith. sign in

arxiv: 1905.08165 · v1 · pith:JXUW6ZKLnew · submitted 2019-05-20 · 📊 stat.ML · cs.LG

Gradient Ascent for Active Exploration in Bandit Problems

classification 📊 stat.ML cs.LG
keywords ascentactivealgorithmbanditexplorationgradientproblemproblems
0
0 comments X
read the original abstract

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context

    cs.LG 2025-02 unverdicted novelty 6.0

    Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.