Extreme bandits
Pith reviewed 2026-05-08 01:14 UTC · model grok-4.3
The pith
The ExtremeHunter algorithm minimizes extreme regret to detect the source with the heaviest tail under limited feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that extreme regret, defined as the performance gap to the oracle policy that selects the source with the heaviest tail, can be minimized in a limited-feedback sequential setting through the proposed ExtremeHunter algorithm and its accompanying analysis.
What carries the argument
The ExtremeHunter algorithm, an adaptive selection rule that sequentially queries sources to approximate the heaviest-tail oracle and thereby reduce extreme regret.
If this is right
- Extreme regret serves as a well-defined performance criterion for extreme-value detection tasks.
- The algorithm yields theoretical bounds on extreme regret under the limited-feedback model.
- Empirical tests confirm practical gains on synthetic distributions and real intrusion-detection data.
- The approach applies directly to resource allocation in medicine, security, and life sciences.
Where Pith is reading between the lines
- The same limited-feedback setup could support other non-mean objectives such as tail-risk minimization.
- Integration with classical extreme-value theory tools might improve the oracle approximation step.
- The method suggests that bandit algorithms can be redesigned around distribution shape rather than location parameters.
Load-bearing premise
The underlying distributions admit a heaviest tail that can be identified and exploited through sequential limited observations so that extreme regret becomes a minimizable quantity.
What would settle it
If ExtremeHunter fails to produce lower extreme regret than uniform random sampling on a dataset containing one clearly heaviest-tailed source, the central claim would not hold.
Figures
read the original abstract
In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values. In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward. However, in other problems such as network intrusion detection, we are interested in detecting the most extreme value output by the sources. Therefore, in our work we study extreme regret which measures the efficiency of an algorithm compared to the oracle policy selecting the source with the heaviest tail. We propose the ExtremeHunter algorithm, provide its analysis, and evaluate it empirically on synthetic and real-world experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies a variant of the multi-armed bandit problem focused on extreme values rather than means. It defines 'extreme regret' as the performance gap relative to an oracle policy that always selects the arm with the heaviest tail. The authors introduce the ExtremeHunter algorithm, claim to provide its theoretical analysis (including regret bounds), and report empirical results on both synthetic and real-world data.
Significance. If the analysis establishes sublinear extreme regret under limited feedback without strong parametric assumptions on the tails, the work would meaningfully extend bandit theory to tail-focused objectives relevant to intrusion detection, medicine, and security applications. The inclusion of real-world experiments is a positive step toward practicality. However, the significance is currently limited by the need to verify that the guarantees hold for general distributions in the domain of attraction of extreme-value laws.
major comments (2)
- [Theoretical analysis section] The analysis of ExtremeHunter (presumably in the theoretical section following the algorithm definition) claims sublinear extreme regret by sequentially approximating the heaviest-tail oracle. This appears to require consistent identification of the heaviest tail from single draws per round. For distributions belonging only to the domain of attraction without known tail index or second-order regular variation, the sample complexity for reliable distinction may exceed the horizon, rendering the bound non-sublinear. Please state the precise regret bound, all assumptions (including any tail regularity conditions), and whether the result is fully nonparametric.
- [Problem formulation / extreme regret definition] The definition of extreme regret and the oracle policy (likely in the problem formulation section) assumes the heaviest tail is uniquely identifiable. When multiple arms have comparable tails, the oracle may not be unique, which could make the regret measure ill-defined or the comparison to the oracle ambiguous. Clarify how the regret is computed in such cases and whether the analysis accounts for it.
minor comments (1)
- [Experiments] The experimental section would benefit from explicit details on how synthetic data are generated (e.g., specific tail indices or distributions used) and the exact real-world datasets, including preprocessing steps for measuring extremes, to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. We address each major comment in turn below and will prepare a revised manuscript that incorporates the requested clarifications.
read point-by-point responses
-
Referee: [Theoretical analysis section] The analysis of ExtremeHunter (presumably in the theoretical section following the algorithm definition) claims sublinear extreme regret by sequentially approximating the heaviest-tail oracle. This appears to require consistent identification of the heaviest tail from single draws per round. For distributions belonging only to the domain of attraction without known tail index or second-order regular variation, the sample complexity for reliable distinction may exceed the horizon, rendering the bound non-sublinear. Please state the precise regret bound, all assumptions (including any tail regularity conditions), and whether the result is fully nonparametric.
Authors: We agree that the current presentation would benefit from greater explicitness. The analysis of ExtremeHunter (Section 3) establishes a sublinear extreme-regret bound under the assumption that each arm distribution lies in the domain of attraction of an extreme-value law and satisfies a second-order regular-variation condition with a positive tail index. These conditions ensure that the sequential samples collected by the algorithm suffice for consistent ordering of the tails within the horizon. The result is therefore not fully nonparametric. In the revision we will (i) restate the exact regret bound from the main theorem, (ii) list every assumption in a dedicated paragraph immediately preceding the theorem, and (iii) add an explicit remark that the guarantees do not extend to distributions lacking the second-order regular-variation condition. revision: yes
-
Referee: [Problem formulation / extreme regret definition] The definition of extreme regret and the oracle policy (likely in the problem formulation section) assumes the heaviest tail is uniquely identifiable. When multiple arms have comparable tails, the oracle may not be unique, which could make the regret measure ill-defined or the comparison to the oracle ambiguous. Clarify how the regret is computed in such cases and whether the analysis accounts for it.
Authors: We acknowledge the potential ambiguity. The current definition takes the oracle to be any arm whose tail index equals the maximal index; when several arms share this maximal index the extreme regret is measured against the best attainable extreme value among those tied arms. The analysis in Section 3 already covers this case by using the minimal possible extreme value over the set of maximal-tail arms. Nevertheless, the manuscript does not spell out the tie-breaking rule explicitly. We will add a short paragraph in the problem-formulation section that defines the oracle set and states that regret is computed with respect to the best member of that set. revision: yes
Circularity Check
No circularity: extreme regret and ExtremeHunter derived independently from oracle definition
full rationale
The paper defines extreme regret directly as the performance gap to an oracle that always selects the heaviest-tailed source. It then introduces ExtremeHunter as a new algorithm for sequentially approximating this oracle under bandit feedback and analyzes its regret. No equation or step reduces the claimed sublinear bound or algorithm to a fitted parameter, self-citation, or renamed input by construction. The derivation chain is self-contained against the new regret definition and does not rely on prior fitted quantities from the same authors.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption There exists an oracle policy that always selects the source with the heaviest tail.
- domain assumption Extreme regret is a meaningful and analyzable performance measure under sequential limited feedback.
Reference graph
Works this paper leans on
-
[1]
Outlier Detection by Active Learning
Naoki Abe, Bianca Zadrozny, and John Langford. Outlier Detection by Active Learning. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 504–509, 2006
work page 2006
-
[2]
Mix- ture models of endhost network traffic
John Mark Agosta, Jaideep Chandrashekar, Mark Crovella, Nina Taft, and Daniel Ting. Mix- ture models of endhost network traffic. InIEEE International Conference on Computer Com- munications (INFOCOM),, pages 225–229
-
[3]
Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002
Peter Auer, Nicol `o Cesa-Bianchi, and Paul Fischer. Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002
work page 2002
-
[4]
Bandits With Heavy Tail.IEEE Transactions on Information Theory, 59(11):7711–7717, 2013
S ´ebastien Bubeck, Nicol `o Cesa-Bianchi, and G ´abor Lugosi. Bandits With Heavy Tail.IEEE Transactions on Information Theory, 59(11):7711–7717, 2013
work page 2013
-
[5]
Pure Exploration in Multi-armed Bandits Problems.Algorithmic Learning Theory, pages 23–37, 2009
S ´ebastien Bubeck, R ´emi Munos, and Gilles Stoltz. Pure Exploration in Multi-armed Bandits Problems.Algorithmic Learning Theory, pages 23–37, 2009
work page 2009
-
[6]
Alexandra Carpentier and Arlene K. H. Kim. Adaptive and minimax optimal estimation of the tail coefficient.Statistica Sinica, 2014
work page 2014
-
[7]
Alexandra Carpentier and Arlene K. H. Kim. Honest and adaptive confidence interval for the tail coefficient in the Pareto model.Electronic Journal of Statistics, 2014
work page 2014
-
[8]
Anomaly detection: A survey.ACM Computing Surveys, 41(3):15:1–15:58, July 2009
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey.ACM Computing Surveys, 41(3):15:1–15:58, July 2009
work page 2009
-
[9]
Vincent A. Cicirello and Stephen F. Smith. The max k-armed bandit: A new model of explo- ration applied to search heuristic selection.AAAI Conference on Artificial Intelligence (AAAI), 2005
work page 2005
-
[10]
Springer Series in Operations Research and Financial Engineering
Laurens de Haan and Ana Ferreira.Extreme Value Theory: An Introduction. Springer Series in Operations Research and Financial Engineering. Springer, 2006
work page 2006
-
[11]
Ronald Aylmer Fisher and Leonard Henry Caleb Tippett. Limiting forms of the frequency distribution of the largest or smallest member of a sample.Mathematical Proceedings of the Cambridge Philosophical Society, 24:180, 1928
work page 1928
-
[12]
Boris Gnedenko. Sur la distribution limite du terme maximum d’une s ´erie al ´eatoire.The Annals of Mathematics, 44(3):423–453, 1943
work page 1943
-
[13]
Peter Hall and Alan H. Welsh. Best Attainable Rates of Convergence for Estimates of Param- eters of Regular Variation.The Annals of Statistics, 12(3):1079–1084, 1984
work page 1984
-
[14]
Tze L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics, 6(1):4–22, 1985
work page 1985
-
[15]
Dynamic Intrusion Detection in Resource-Constrained Cyber Net- works
Keqin Liu and Qing Zhao. Dynamic Intrusion Detection in Resource-Constrained Cyber Net- works. InIEEE International Symposium on Information Theory (ISIT), 2012
work page 2012
-
[16]
Novelty detection: a review, part 1: statistical approaches
Markos Markou and Sameer Singh. Novelty detection: a review, part 1: statistical approaches. Signal Processing, 83(12):2481–2497, 2003
work page 2003
-
[17]
Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization.Machine Learning, 79:261–282, 2010
work page 2010
-
[18]
Carey E. Priebe, John M. Conroy, David J. Marchette, and Youngser Park. Scan Statistics on Enron Graphs. InComputational and Mathematical Organization Theory, volume 11, pages 229–247, 2005
work page 2005
-
[19]
Ingo Steinwart, Don Hush, and Clint Scovel. A Classification Framework for Anomaly Detec- tion.Journal of Machine Learning Research, 6:211–232, 2005
work page 2005
-
[20]
Matthew J. Streeter and Stephen F. Smith. A Simple Distribution-Free Approach to the Max k- Armed Bandit Problem. InInternational Conference on Principles and Practice of Constraint Programming (CP), volume 4204, pages 560–574, 2006
work page 2006
-
[21]
Matthew J. Streeter and Stephen F. Smith. An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem. InAAAI Conference on Artificial Intelligence (AAAI), pages 135–142, 2006
work page 2006
-
[22]
Ryan Turner, Zoubin Ghahramani, and Steven Bottone. Fast online anomaly detection using scan statistics.IEEE Workshop on Machine Learning for Signal Processing (MLSP), 2010. 9 A Proof of Lemma 3 Lemma 3.Assume thatX 1, . . . , XT areTi.i.d. samples drawn according to(α, β, C, C ′)-second order Pareto distribution, then for anyx≥B: P max i Xi ≤x −exp −T Cx...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.