Optimal Best Arm Identification with Fixed Confidence
classification
🧮 math.ST
cs.LGstat.MLstat.TH
keywords
optimalboundcomplexitygiveidentificationlowerproverule
read the original abstract
We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Anytime-valid Optimal Policy Identification
Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.