pith. sign in

Gradient Ascent for Active Exploration in Bandit Problems

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

years

2025 1 2019 1

verdicts

UNVERDICTED 2

representative citing papers

Non-Asymptotic Pure Exploration by Solving Games

stat.ML · 2019-06-25 · unverdicted · novelty 7.0

Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.

citing papers explorer

Showing 2 of 2 citing papers.

  • Non-Asymptotic Pure Exploration by Solving Games stat.ML · 2019-06-25 · unverdicted · none · ref 24 · internal anchor

    Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.

  • Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context cs.LG · 2025-02-05 · unverdicted · none · ref 52 · internal anchor

    Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.