Gradient Ascent for Active Exploration in Bandit Problems
classification
📊 stat.ML
cs.LG
keywords
ascentactivealgorithmbanditexplorationgradientproblemproblems
read the original abstract
We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context
Introduces BAI with post-action context in fixed-confidence stochastic bandits, derives instance-dependent lower bounds, and gives asymptotically optimal algorithms for separator and non-separator cases.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.