On Finding the Largest Mean Among Many

Kevin Jamieson; Matthew Malloy; Robert Nowak; Sebastien Bubeck

arxiv: 1306.3917 · v1 · pith:NYR3P2BSnew · submitted 2013-06-17 · 📊 stat.ML · cs.LG

On Finding the Largest Mean Among Many

Kevin Jamieson , Matthew Malloy , Robert Nowak , Sebastien Bubeck This is my paper

classification 📊 stat.ML cs.LG

keywords samplecomplexitymeanmulti-armedproceduresarmsbanditbest

0 comments

read the original abstract

Sampling from distributions to find the one with the largest mean arises in a broad range of applications, and it can be mathematically modeled as a multi-armed bandit problem in which each distribution is associated with an arm. This paper studies the sample complexity of identifying the best arm (largest mean) in a multi-armed bandit problem. Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms. We present a single-parameter multi-armed bandit model that spans the range from linear to superlinear sample complexity. We also give a new algorithm for best arm identification, called PRISM, with linear sample complexity for a wide range of mean distributions. The algorithm, like most exploration procedures for multi-armed bandits, is adaptive in the sense that the next arms to sample are selected based on previous samples. We compare the sample complexity of adaptive procedures with simpler non-adaptive procedures using new lower bounds. For many problem instances, the increased sample complexity required by non-adaptive procedures is a polynomial factor of the number of arms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms
cs.LG 2019-06 unverdicted novelty 7.0

FIESTA uses bandit algorithms to adaptively decide how many seeds and splits to run for each candidate model, focusing effort on promising ones while providing guarantees on selecting the optimal model.
Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes
cs.LG 2026-05 unverdicted novelty 5.0

A new posterior sampling algorithm for (ε, δ)-PAC policy identification in tabular MDPs achieves asymptotic optimality in sample complexity and posterior contraction rate with O(S²AH) runtime per episode.