One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise
Pith reviewed 2026-05-15 22:08 UTC · model grok-4.3
The pith
Bandits with multiple noisy data sources can match the regret of the single best source after quick pruning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SOAR achieves the instance-dependent regret bound of order sigma-star squared times sum log T over Delta-i plus square-root of K times sum of all sigma-j squared, where sigma-star is the smallest source variance. This matches the optimal single-source MAB regret up to an unavoidable additive identification cost that depends only on the aggregate variance across sources.
What carries the argument
The balanced min-max LCB-UCB rule that maintains joint optimistic estimates for arms and sources so that pruning and arm selection occur in a single coordinated process.
If this is right
- Regret remains insensitive to the worst source variance provided a single better source exists.
- The additive source-identification cost is unavoidable and scales with the square root of total variance.
- Uniform sampling across sources or explore-then-commit baselines can incur regret proportional to the largest variance, which may be much worse.
- When a clear minimum-variance source is present, the algorithm quickly concentrates all queries on it without sacrificing arm-identification speed.
Where Pith is reading between the lines
- The same pruning idea could be applied when sources have query costs or when only a subset of sources is available each round.
- The bound suggests that maintaining a small pool of redundant sensors or data streams is cheap in regret terms as long as one low-noise stream exists.
- The technique may extend to contextual bandits or reinforcement learning where multiple observation models with different noise levels are available.
Load-bearing premise
Noise variances are fixed over time and at least one minimum-variance source remains available to query in every round.
What would settle it
An experiment in which one or more source variances increase over time and the observed cumulative regret grows linearly with the maximum variance instead of staying within the stated bound.
read the original abstract
We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{\sigma_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({\sigma^*}^2\sum_{i=2}^K \frac{\log T}{\Delta_i} + \sqrt{K \sum_{j=1}^M \sigma_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${\sigma^*}^2 := \min_j \sigma_j^2$ is the minimum source variance and $\Delta_i$ denotes the suboptimality gap of the $i$-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among $M$ alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance ${\sigma^*}^2$, while incurring only an small (and unavoidable) additive cost of $\tilde O(\sqrt{K \sum_{j=1}^M \sigma_j^2})$ towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with $\sigma_{\max}^2$ in place of ${\sigma^*}^2$-a gap that can be arbitrarily large when $\sigma_{\max} \gg \sigma^*$. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies the K-armed MAB problem with M heterogeneous sources having unknown distinct variances {σ_j²}. It proposes the SOAR algorithm, which prunes high-variance sources via sharp variance-concentration bounds and then applies a balanced min-max LCB-UCB procedure to jointly solve for the best arm and the minimum-variance source. The central claim is an instance-dependent regret bound of Õ(σ*² ∑_{i=2}^K log T / Δ_i + √(K ∑_{j=1}^M σ_j²)), where σ*² = min_j σ_j², together with preprocessing costs depending only on problem parameters. This is asserted to match the optimal single-source regret (with variance σ*²) up to a small additive term for source identification. Experiments on synthetic instances and the MovieLens 25M dataset are reported to show superiority over baselines such as Uniform UCB and Explore-then-Commit UCB.
Significance. If the stated regret bound holds, the result is significant: it shows that adaptive selection among heterogeneous sources can be performed with only an additive Õ(√(K ∑ σ_j²)) overhead while retaining the optimal instance-dependent scaling in the best variance σ*² rather than σ_max². This closes a gap that can be arbitrarily large when variances differ substantially. The combination of variance pruning with an integrated min-max LCB-UCB step is a technically interesting contribution, and the explicit instance-dependent form plus parameter-only preprocessing costs are strengths. Empirical results on MovieLens provide supporting evidence of practical relevance.
major comments (3)
- [Abstract and §4] Abstract and §4 (Regret Analysis): the central bound Õ(σ*² ∑ log T / Δ_i + √(K ∑ σ_j²)) is derived under the assumption that the argmin source remains queryable in every round and that all {σ_j²} are fixed and stationary. This assumption is load-bearing for both the pruning step and the subsequent σ* scaling; any drift or access restriction would invalidate the high-probability identification of σ* and allow regret to scale with a larger effective variance. A precise statement of this modeling assumption and a brief discussion of its necessity should be added.
- [§3.2] §3.2 (Pruning and Algorithm): the sharp variance-concentration bounds used to prune high-variance sources are invoked to guarantee that the minimum-variance source is identified with high probability after a preprocessing phase whose cost depends only on problem parameters. The explicit concentration inequality, the failure probability, and the precise dependence of the preprocessing length on K, M, and the σ_j values are not visible in the provided text; these must be stated so that the additive √(K ∑ σ_j²) term can be verified to be unavoidable and correctly derived.
- [§3.3] §3.3 (Balanced min-max LCB-UCB): the integration of arm identification and source identification via the min-max LCB-UCB rule is described at a high level. The precise update rules, the way the min and max are taken over the joint (arm, source) pairs, and the proof that this rule avoids paying σ_max² while still achieving the claimed bound need to be expanded with explicit equations or pseudocode; without them the claim that the two tasks are “seamlessly integrated” cannot be checked.
minor comments (3)
- [Abstract] Abstract: “an small (and unavoidable)” should read “a small (and unavoidable)”.
- [Abstract] Abstract: the Õ notation hides unspecified logarithmic factors; state explicitly whether log log T or other iterated logs appear.
- [Experiments] Experiments section: more detail is needed on how the M heterogeneous sources are constructed or sampled from the MovieLens 25M dataset.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below by clarifying modeling assumptions, adding explicit technical details, and expanding the algorithmic description. All requested changes will be incorporated in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Regret Analysis): the central bound Õ(σ*² ∑ log T / Δ_i + √(K ∑ σ_j²)) is derived under the assumption that the argmin source remains queryable in every round and that all {σ_j²} are fixed and stationary. This assumption is load-bearing for both the pruning step and the subsequent σ* scaling; any drift or access restriction would invalidate the high-probability identification of σ* and allow regret to scale with a larger effective variance. A precise statement of this modeling assumption and a brief discussion of its necessity should be added.
Authors: We agree that the stationarity of variances and unrestricted access to all sources are implicit modeling assumptions required for the high-probability pruning and σ* scaling. In the revision we will add an explicit statement of these assumptions in the problem formulation (Section 2) and a short discussion paragraph in §4 explaining their necessity: without stationarity the variance estimates cannot be reliably concentrated, and without continuous access the algorithm cannot guarantee querying the identified minimum-variance source in every round. This will make the scope of the bound transparent. revision: yes
-
Referee: [§3.2] §3.2 (Pruning and Algorithm): the sharp variance-concentration bounds used to prune high-variance sources are invoked to guarantee that the minimum-variance source is identified with high probability after a preprocessing phase whose cost depends only on problem parameters. The explicit concentration inequality, the failure probability, and the precise dependence of the preprocessing length on K, M, and the σ_j values are not visible in the provided text; these must be stated so that the additive √(K ∑ σ_j²) term can be verified to be unavoidable and correctly derived.
Authors: The explicit concentration inequality (a Bernstein-type bound on the empirical variance estimator for sub-Gaussian observations) and the failure probability δ = 1/T² appear in the appendix but were not restated in §3.2. In the revision we will insert the precise inequality, set the failure probability explicitly, and derive the preprocessing length as O((max_j σ_j² / ε²) log(M T)) where ε is chosen to separate the minimum variance from the others. This will directly verify that the additive Õ(√(K ∑ σ_j²)) term is the unavoidable cost of source identification under a union bound over M sources and K arms. revision: yes
-
Referee: [§3.3] §3.3 (Balanced min-max LCB-UCB): the integration of arm identification and source identification via the min-max LCB-UCB rule is described at a high level. The precise update rules, the way the min and max are taken over the joint (arm, source) pairs, and the proof that this rule avoids paying σ_max² while still achieving the claimed bound need to be expanded with explicit equations or pseudocode; without them the claim that the two tasks are “seamlessly integrated” cannot be checked.
Authors: We agree the description in §3.3 is too high-level. In the revision we will add explicit pseudocode for the index computation, showing that at each round the algorithm selects the pair (i,j) that minimizes the lower-confidence bound on the source variance while maximizing the upper-confidence bound on the arm reward, with the indices updated using the current empirical variance of the chosen source. A short proof sketch will be included demonstrating that after the pruning phase the algorithm queries the minimum-variance source with probability 1-o(1), thereby incurring only σ*² in the leading term rather than σ_max². revision: yes
Circularity Check
No circularity: standard concentration analysis yields independent regret bound
full rationale
The paper's central regret bound is obtained via explicit analysis steps that apply standard variance-concentration inequalities to prune sources and then integrate LCB-UCB selection; these steps rely on external concentration tools and do not reduce by construction to any fitted parameter, self-definition, or self-citation chain within the paper. The minimum-variance source identification cost appears as an additive term derived from the same concentration bounds rather than being presupposed. No load-bearing uniqueness theorem or ansatz is imported from prior self-work, and the derivation remains self-contained against the stated stationary-variance assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard sub-Gaussian or variance concentration inequalities hold for the noise in each source
- domain assumption Noise variances are fixed and unknown but constant across rounds
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.