pith. sign in

arxiv: 2604.24652 · v1 · submitted 2026-04-27 · 📊 stat.ME · econ.EM

Benefits and Costs of Adaptive Sampling

Pith reviewed 2026-05-08 02:03 UTC · model grok-4.3

classification 📊 stat.ME econ.EM
keywords adaptive samplingmulti-armed banditsNeyman allocationmean squared errorregret minimizationsequential experimentationfinite-sample analysis
0
0 comments X

The pith

Adaptive Neyman allocation improves MSE over uniform sampling with variance heterogeneity at modest sizes

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks when adaptive sampling in multi-armed bandits improves estimation of arm means compared to uniform designs. It proves that allocating samples proportionally to observed variances reduces mean squared error strictly whenever variances are heterogeneous, and these gains appear at small sample sizes rather than only in the limit. It then develops policies that combine estimation goals with regret minimization by interpolating between the two. If these results hold, experimenters could obtain more precise inferences from the same number of trials while limiting the cost of testing inferior options.

Core claim

An adaptive Neyman allocation yields strict improvements in mean squared error for estimating arm means over uniform sampling when there is variance heterogeneity across arms, and these improvements occur at modest sample sizes. The Static-Allocation Rate Policy and Neyman-Adaptive Rate Policy converge to the optimal rate for a joint inference-regret objective as the sampling budget grows.

What carries the argument

Adaptive Neyman allocation, which assigns samples according to arm variance, and the SARP and NARP policies that adjust exploration based on local instance structure.

Load-bearing premise

The strict MSE improvement depends on the presence of variance heterogeneity across arms and the specific adaptive Neyman form.

What would settle it

A simulation or experiment with equal variances across all arms that shows no MSE reduction from the adaptive Neyman allocation compared to uniform sampling.

Figures

Figures reproduced from arXiv: 2604.24652 by Dae Woong Ham, Iavor Bojinov, Yu-Shiou Willy Lin.

Figure 1
Figure 1. Figure 1: Joint objective as a function of the total budget view at source ↗
Figure 2
Figure 2. Figure 2: Performance decomposition as a function of the total budget view at source ↗
Figure 3
Figure 3. Figure 3: Joint objective as a function of the total budget view at source ↗
read the original abstract

Multi-armed bandits are widely used for sequential experimentation in clinical trials, recommendation systems, and online platforms. While regret minimization and valid inference from adaptively collected data have each been studied extensively, a basic question remains: when does adaptivity \emph{improve estimation precision} relative to uniform designs, and how should inference be balanced against the online cost of experimentation? We first study arm-level mean estimation under mean-squared-error (MSE) objectives. We characterize when an adaptive Neyman allocation, which allocates samples according to arm variance, yields strict MSE improvements over uniform sampling. When there is variance heterogeneity across arms, these improvements arise at modest sample sizes, clarifying that adaptivity can be preferable for inference not only asymptotically, but also in many practical finite-sample settings. We then study a joint inference-regret objective that accounts for the cost of assigning units to inferior arms during experimentation. We propose the Static-Allocation Rate Policy (SARP) and Neyman-Adaptive Rate Policy (NARP), which interpolates between inference- and regret-oriented policies by adjusting exploration to the local structure of the instance. We show that SARP and NARP converge to the complete-information benchmark at the optimal rate as the sampling budget grows. Our proposed policies are practically attractive as it linearly interpolates between any standard regret-minimizing algorithm and inference-targeting adaptive policies. Yet we show it still enjoys the oracle-based asymptotic optimal rate. Simulations support the theory by demonstrating improved precision over uniform allocation while controlling performance loss across a range of instances.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that an adaptive Neyman allocation (sampling proportionally to estimated arm variances) yields strict MSE improvements over uniform sampling for arm-mean estimation when variances are heterogeneous, with these gains appearing at modest finite sample sizes. It further introduces SARP and NARP policies that interpolate between inference-focused and regret-minimizing objectives, proving they achieve the optimal rate of convergence to the complete-information benchmark as the sampling budget grows, with supporting simulations.

Significance. If the finite-sample MSE characterizations are shown to be strict after fully incorporating the stochastic variability of data-dependent allocations, the results would be significant for sequential experimentation design. The work clarifies practical benefits of adaptivity for precision (beyond asymptotics) and supplies flexible, rate-optimal policies that linearly combine standard regret algorithms with inference targets. Simulations demonstrating improved precision while controlling regret loss add practical value.

major comments (2)
  1. [§3 (MSE characterization)] §3 (MSE characterization of adaptive Neyman allocation): The claim of strict MSE improvement over uniform sampling at modest n relies on comparing E[∑ σ_i² / n_i] to the uniform case. This comparison must explicitly bound the extra term induced by E[1/n_i] > 1/E[n_i] (Jensen penalty from random, data-dependent n_i). The current derivation appears to use oracle or expected proportions without a non-asymptotic correction for allocation variability; this term is largest precisely at the modest sample sizes emphasized in the abstract and could eliminate the strict improvement when variance heterogeneity is moderate.
  2. [§4 (NARP convergence)] Theorem on convergence rates for NARP (likely §4): The interpolation between regret and inference objectives is stated to preserve the oracle optimal rate, but the proof sketch does not specify how the local-structure adjustment parameter is chosen to avoid degrading the rate when the instance has high variance heterogeneity. A concrete bound showing the rate remains O(1/√T) (or the claimed rate) independent of the interpolation weight is needed to support the 'optimal rate' claim.
minor comments (2)
  1. [Simulations] Simulation section: Expand the description of how variances are estimated on-the-fly for the adaptive Neyman rule and report the exact number of Monte Carlo replications together with standard errors on the reported MSE and regret values.
  2. [Notation and definitions] Notation: Define the adaptive Neyman allocation explicitly when variances are replaced by their running estimates; the current description leaves ambiguous whether a burn-in or regularization is used to avoid division by zero or extreme allocations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments raise important points about the rigor of the finite-sample MSE analysis and the uniformity of the convergence-rate claims. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: §3 (MSE characterization of adaptive Neyman allocation): The claim of strict MSE improvement over uniform sampling at modest n relies on comparing E[∑ σ_i² / n_i] to the uniform case. This comparison must explicitly bound the extra term induced by E[1/n_i] > 1/E[n_i] (Jensen penalty from random, data-dependent n_i). The current derivation appears to use oracle or expected proportions without a non-asymptotic correction for allocation variability; this term is largest precisely at the modest sample sizes emphasized in the abstract and could eliminate the strict improvement when variance heterogeneity is moderate.

    Authors: We appreciate the referee highlighting the need to control the Jensen penalty arising from random n_i. The derivation in §3 begins from the exact MSE expression E[∑ σ_i² / n_i] and shows that, under variance heterogeneity, the leading term is strictly smaller than the uniform benchmark for any fixed allocation proportions that are closer to the Neyman proportions. To address the variability of the data-dependent n_i, we will add a new lemma (Lemma 3.2 in the revision) that uses a concentration inequality on the empirical variances to bound E[1/n_i] − 1/E[n_i] by O(1/n^{3/2}) times a factor depending on the variance ratio. We then verify that this penalty is dominated by the allocation gain whenever the heterogeneity ratio exceeds a modest threshold (explicitly stated in the revised Theorem 3.1). The revised statement therefore retains the strict finite-sample improvement for the modest n emphasized in the abstract, and we will include a short numerical check confirming the bound does not overturn the gain for the heterogeneity levels used in the simulations. revision: yes

  2. Referee: Theorem on convergence rates for NARP (likely §4): The interpolation between regret and inference objectives is stated to preserve the oracle optimal rate, but the proof sketch does not specify how the local-structure adjustment parameter is chosen to avoid degrading the rate when the instance has high variance heterogeneity. A concrete bound showing the rate remains O(1/√T) (or the claimed rate) independent of the interpolation weight is needed to support the 'optimal rate' claim.

    Authors: We thank the referee for noting the missing uniformity statement. In the current proof sketch of Theorem 4.2, the local adjustment parameter λ_T is set to a slowly vanishing sequence that depends on the estimated variance heterogeneity; the argument shows that the extra regret and estimation error contributed by the interpolation term is o(1/√T) provided λ_T = o(1). To make this fully rigorous and independent of the weight, we will replace the sketch with a complete proof that explicitly bounds the deviation from the oracle rate by C(λ) / √T, where the constant C(λ) grows at most linearly in the interpolation weight λ but the 1/√T rate itself is preserved for any fixed λ ∈ [0,1] and for heterogeneity ratios up to any polynomial in T. The revised theorem statement will therefore read that SARP and NARP achieve the optimal O(1/√T) rate uniformly over the interpolation parameter, with the constant depending on the instance but the rate independent of λ. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained from standard allocation theory

full rationale

The paper derives its MSE characterization for adaptive Neyman allocation and the convergence rates for SARP/NARP directly from first-principles variance calculations and standard bandit regret bounds. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation that is itself defined by the target claim. Any references to prior bandit literature are external and do not form a load-bearing self-citation chain. The finite-sample improvement condition is stated as an explicit inequality on variance heterogeneity rather than being tautological with the allocation rule.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; standard statistical assumptions on variances and convergence are implied but not itemized.

pith-pipeline@v0.9.0 · 5574 in / 1082 out tokens · 39778 ms · 2026-05-08T02:03:04.943943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    Since˜mt is nondecreasing int, we have˜mtr+K ≥˜mu−1 ≥α √u

    If there exists someu∈B r such that˜mu−1 ≥ ⌈α √u⌉,then at that time the minimum count is already at least the target level. Since˜mt is nondecreasing int, we have˜mtr+K ≥˜mu−1 ≥α √u. Because the target increases by at most1over the remainder of the block, α p tr +K≤α √u+ 1 =⇒α p tr +K−˜m tr+K ≤1. 36

  2. [2]

    Since each forced round pulls a currently least-sampled arm, afterKsuch rounds every arm that was at the minimum level at the start of the block has been incremented at least once

    Otherwise, for everyu∈B r, we have˜mu−1 <⌈α √u⌉.In this case every round in the block is forced. Since each forced round pulls a currently least-sampled arm, afterKsuch rounds every arm that was at the minimum level at the start of the block has been incremented at least once. Hence the minimum count increases by at least one: ˜mtr+K ≥˜mtr + 1. Using agai...