Rate-optimal Design for Anytime Best Arm Identification
Pith reviewed 2026-05-18 03:35 UTC · model grok-4.3
The pith
Almost Tracking identifies the best arm at optimal rates without knowing the total sample budget in advance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We consider a class of algorithms for best arm identification that is provably minimax optimal up to a constant factor. This framework generalizes existing fixed-budget results to the anytime setting. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure H1. Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples.
What carries the argument
Almost Tracking, a closed-form sampling rule that approximately follows the optimal allocation proportions for any budget without advance knowledge of the horizon.
If this is right
- Almost Tracking supplies explicit performance guarantees on H1 for any stopping time.
- The rule can be deployed without specifying a total sample budget ahead of time.
- All collected samples are retained rather than discarded, increasing efficiency.
- The same framework supports design of algorithms for other risk measures beyond H1.
Where Pith is reading between the lines
- The approach removes a common practical barrier in online experiments where total sample size is hard to set in advance.
- Similar tracking rules could be derived for other sequential allocation problems such as regret minimization or pure exploration in more complex environments.
- Because the method is closed-form, it may be easier to combine with statistical inference procedures that require unbiased estimates from every observation.
Load-bearing premise
The algorithms in the considered class are minimax optimal up to a constant factor, which lets fixed-budget optimality carry over to the anytime case.
What would settle it
Run Almost Tracking on a fixed set of arms with known means and check whether its H1 error rate stays within the predicted bound for every possible stopping time; if the observed error exceeds the bound by more than the allowed constant factor, the guarantee is false.
read the original abstract
We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of $K$ arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor. This idea is a generalization of existing works in fixed-budget best arm identification, which are limited to a particular choice of risk measures. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure $H_1$. Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples, which gives a practical advantage. Through experiments on synthetic and real-world datasets, we show that our algorithm outperforms existing anytime algorithms as well as fixed-budget algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers the best arm identification problem under a limited sampling budget and focuses on the anytime regime. It introduces a framework based on a class of algorithms that are claimed to be minimax optimal up to a constant factor, generalizing prior fixed-budget results. Using this framework, the authors propose the closed-form Almost Tracking algorithm, which provides a provable guarantee on the H1 risk measure without requiring the total budget in advance or discarding a significant fraction of samples. The manuscript includes theoretical analysis supporting rate-optimality and reports experimental results on synthetic and real-world datasets showing outperformance relative to existing anytime and fixed-budget algorithms.
Significance. If the central claims hold, the work would be significant for bridging fixed-budget minimax theory with practical anytime best-arm identification. The closed-form nature of Almost Tracking and its avoidance of sample discarding offer clear practical advantages for applications such as A/B testing. The generalization of the optimality framework could influence subsequent algorithm design in sequential decision problems, provided the reduction to the anytime setting is placed on firm ground.
major comments (1)
- [§4] §4, Theorem 2 and the surrounding discussion of rate-optimality: the argument that Almost Tracking inherits rate-optimality on H1 from the minimax optimality (up to constants) of the underlying algorithm class relies on invoking fixed-budget lower bounds. The manuscript does not supply a matching information-theoretic lower bound that accounts for the random stopping time whose distribution depends on the same observations used to compute the risk; conditioning on the realized sample count does not automatically absorb possible extra logarithmic factors that arise in the anytime regime.
minor comments (2)
- [§6] The experimental section would benefit from an explicit statement of how the H1 risk is estimated at each time step in the anytime setting and whether any post-hoc tuning of parameters occurs.
- [§2] Notation for the risk measure H1 and the definition of the stopping time could be introduced earlier and used consistently to improve readability.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable feedback on our manuscript. We address the major comment point by point below and outline the changes we will make in the revision.
read point-by-point responses
-
Referee: [§4] §4, Theorem 2 and the surrounding discussion of rate-optimality: the argument that Almost Tracking inherits rate-optimality on H1 from the minimax optimality (up to constants) of the underlying algorithm class relies on invoking fixed-budget lower bounds. The manuscript does not supply a matching information-theoretic lower bound that accounts for the random stopping time whose distribution depends on the same observations used to compute the risk; conditioning on the realized sample count does not automatically absorb possible extra logarithmic factors that arise in the anytime regime.
Authors: We appreciate the referee highlighting this technical nuance in the rate-optimality argument. Theorem 2 establishes an upper bound on the H1 risk of Almost Tracking by first conditioning on the realized stopping time N and then invoking the minimax optimality (up to constants) of the underlying algorithm class in the fixed-budget setting. Because the stopping rule is constructed from the same empirical concentration bounds used in the fixed-budget analysis, the tail probability that N deviates far from its typical value decays exponentially; this ensures that any contribution from atypical stopping times remains absorbed in lower-order terms and does not introduce additional logarithmic factors into the leading rate. We nevertheless agree that a self-contained information-theoretic lower bound that directly treats the random stopping time would strengthen the result. In the revised manuscript we will expand the discussion following Theorem 2 with an explicit auxiliary lemma bounding the overhead induced by the random stopping time and add a remark clarifying the scope of the current lower-bound argument. revision: yes
Circularity Check
No significant circularity; optimality claim rests on external generalization rather than self-referential reduction
full rationale
The paper defines a class of algorithms as provably minimax optimal up to a constant factor by generalizing existing fixed-budget results, then places Almost Tracking inside that class to inherit a guarantee on H1. No equation reduces a prediction to a fitted input by construction, no self-citation is invoked as the sole justification for the uniqueness or optimality of the class, and the anytime extension is presented as an explicit generalization rather than a renaming or ansatz smuggled from prior author work. The derivation chain therefore remains self-contained against the stated external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The considered class of algorithms is provably minimax optimal up to a constant factor
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor... propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure H1
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.