Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.
Gradient Ascent for Active Exploration in Bandit Problems
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.
verdicts
UNVERDICTED 2representative citing papers
Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.
citing papers explorer
-
Non-Asymptotic Pure Exploration by Solving Games
Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.
-
Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context
Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.