pith. sign in

arxiv: 2503.23927 · v3 · pith:5RQP3EF4new · submitted 2025-03-31 · 📊 stat.ML · cs.LG

Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics

Pith reviewed 2026-05-22 22:37 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords anomaly detectionnearest neighborsbinomial testdensity anomaliesmultivariate datalocalized differences
0
0 comments X

The pith

EagleEye detects local density anomalies by testing whether k-nearest-neighbor membership sequences follow a binomial distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EagleEye as a method to find localized over- and under-densities between two samples in multivariate data. It works by turning each point's ordered list of nearest neighbors into a sequence of binary labels indicating which sample they come from, then checking if the running count of one label exceeds what a coin-flip model would predict. If the paper is correct, this allows spotting signal events or regime changes that global methods would miss. The detections are then grouped into sets with estimates of background and purity. Demonstrations include synthetic data, particle physics searches, and climate temperature patterns.

Core claim

EagleEye assigns each point an anomaly score by encoding its ordered k-nearest-neighbour list as a binary membership sequence and testing whether the cumulative number of successes in this sequence is consistent with a binomial null model. In the presence of a genuine local anomaly, neighbours will preferentially belong to one of the two datasets, yielding an excess of successes relative to the binomial null model. These local, pointwise detections are consolidated into interpretable anomaly sets through a deterministic refinement procedure that can also estimate the irreducible background and local density anomaly purity.

What carries the argument

Binary membership sequence of ordered k-nearest neighbors tested against a binomial null model for excess successes.

If this is right

  • It can identify new physics signals at particle colliders even with systematic background differences.
  • It reveals localized spatiotemporal changes in climate temperature patterns.
  • It provides both pointwise anomaly scores and consolidated anomaly sets with purity estimates.
  • It works on artificial data with known localized densities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to detecting anomalies in single datasets by comparing to a reference distribution.
  • Refinement procedure might be useful for estimating anomaly sizes in other density-based methods.
  • Binomial testing on neighbor sequences may generalize to other local statistics in high-dimensional data.

Load-bearing premise

In the absence of a local anomaly the sequence of kNN dataset memberships behaves exactly like independent coin flips with probability set by the global sample fractions.

What would settle it

Applying EagleEye to two samples drawn from identical distributions with no local differences should produce anomaly scores consistent with the binomial null model, with no excess detections after correction.

Figures

Figures reproduced from arXiv: 2503.23927 by Alessandro Laio, Andre Scaffidi, Gabriella Contardo, Heikki Haario, Maximilian Autenrieth, Roberto Trotta, Sebastian Springer.

Figure 1
Figure 1. Figure 1: Sketch illustrating the conceptual steps in the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: EagleEye detection of density anomalies within a uniform background. (A): distribution of anomalies in feature space, showing overdensities (orange) and underdensities (violet). (B): Points flagged as anomalous in the test set (warm orange shades) and in the reference set (cool violet shades). (C) and (D): Local anomalies after Iterative Density Equalization (dark green) and multimodal repˆechage (light gr… view at source ↗
Figure 3
Figure 3. Figure 3: Resonant anomaly detection on the LHC Olympics R&D dataset using EagleEye . This figure demonstrates the application of EagleEye for resonant anomaly detection using the LHC Olympics R&D dataset. Each column corresponds to a different fraction of overdensity anomaly. The plots in the first two rows display 2D slices of the 8-dimensional feature space (in m1 and m2, normalised units) that best illustrate th… view at source ↗
Figure 4
Figure 4. Figure 4: Analysis of Air2m anomalies for the DJF and JJA seasons. This figure demonstrates the application of EagleEye to detect shifts in temperature patterns over the past seventy years using global daily average air temperature fields measured at 2 m above sea level (Air2m). The analysis focuses on the Northern Hemisphere, with separate examinations for winter (DJF) and summer (JJA) seasons. Panels A–D: The numb… view at source ↗
Figure 5
Figure 5. Figure 5: Variation in the cardinality of the repˆechage sets (vertical axis) as a function of the number of contam [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Variation in the size (cardinality) of the resulting sets (vertical-axis) as a function of the number of [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: EagleEye detection of density anomalies within a uniform background. (A): distribution of anomalies in feature space, showing underdensities (violet). (B): Scatter plots of points flagged as anomalous in the reference set (cool violet shades). (D): Local anomalies after Iterative Density Equalization (dark green) and multimodal repˆechage (light green). On the left, overdensities, on the right, underdensit… view at source ↗
Figure 8
Figure 8. Figure 8: Global detection of anomalies in the LHC Olympics R&D dataset. The p-values from a two￾sample Kolmogorov-Smirnov test (blue solid lines with circular markers) and a two-sample Cram´er-von Mises test (purple dash-dotted lines with circle markers) on the distribution of anomaly scores Υi, applied to the LHC Olympics R&D dataset, plotted as a function of the fraction of contaminant anomalous points. The top a… view at source ↗
Figure 9
Figure 9. Figure 9: Global significance tests for the presence of anomalies. This figure demonstrates that stan￾dard global significance tests may fail to detect anomalies that are however correctly identified and localised by EagleEye. The reference dataset consists of 100,170 samples from a 10-dimensional standard Gaussian distribu￾tion. The test dataset includes, on top of 10,000 points drawn from the same 10-dimensional s… view at source ↗
read the original abstract

Detecting localized differences between two samples is a central task in scientific data analysis, required for the identification of signal events, regime changes, or model mismatch. We introduce EagleEye, a method that pinpoints local over- and under-densities in multivariate feature spaces. EagleEye assigns each point an anomaly score by encoding its ordered k-nearest-neighbour list as a binary membership sequence and testing whether the cumulative number of successes in this sequence is consistent with a binomial (coin-flipping) null model. In the presence of a genuine local anomaly, neighbours will preferentially belong to one of the two datasts, yielding an excess of ``successes'' relative to the binomial null model. These local, pointwise detections are consolidated into interpretable anomaly sets through a deterministic refinement procedure that can also estimate the irreducible background and local density anomaly purity. We demonstrate EagleEye's efficacy in three scenarios. We first consider an artificial data example with known localized over- and under-densities. Second, we demonstrate how EagleEye may be used for new physics searches at particle collider experiments in the presence of systematic background modelling differences. Finally, we conduct a climate analysis study that reveals localized changes in spatiotemporal temperature-pattern recurrence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces EagleEye, a method for detecting localized over- and under-densities between two multivariate samples. Each point receives an anomaly score by encoding its ordered k-nearest-neighbor list as a binary membership sequence and testing the cumulative successes against a binomial null model with success probability equal to the global dataset fraction; significant deviations flag anomalies, which are then consolidated via a deterministic refinement procedure that also estimates background and purity. The approach is demonstrated on synthetic data with known anomalies, particle-collider new-physics searches, and climate temperature-pattern analysis.

Significance. If the central construction is valid, EagleEye would supply an interpretable, computationally lightweight pointwise anomaly detector that directly yields statistical significance statements and consolidated anomaly sets, with potential utility in scientific domains requiring localized density comparisons. The binomial framing is simple and avoids explicit density estimation, which is a strength if the null is appropriately calibrated.

major comments (2)
  1. [Abstract and method description] Abstract and central construction of the anomaly score: the binomial null model treats the ordered kNN label sequence as i.i.d. Bernoulli trials with success probability equal to the global fraction. Under the global null, labels are sampled without replacement from a finite combined pool, so the exact distribution of the cumulative count is hypergeometric (or a distance-ordered variant thereof); the binomial therefore understates null variability and miscalibrates per-point p-values. This assumption is load-bearing for all downstream anomaly scores and refinement steps. The artificial-data demonstration does not expose the mismatch because the simulated densities are spatially uniform.
  2. [Demonstrations] Demonstrations (all three scenarios): no quantitative performance metrics, error bars, false-positive rates, or comparisons against baselines are reported. Without these, it is impossible to verify that the detected anomalies correspond to the claimed localized density differences rather than artifacts of the binomial approximation.
minor comments (1)
  1. [Abstract] The abstract uses inconsistent quotation marks around “successes”; standardize notation for the binary sequence and cumulative count throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract and method description] Abstract and central construction of the anomaly score: the binomial null model treats the ordered kNN label sequence as i.i.d. Bernoulli trials with success probability equal to the global fraction. Under the global null, labels are sampled without replacement from a finite combined pool, so the exact distribution of the cumulative count is hypergeometric (or a distance-ordered variant thereof); the binomial therefore understates null variability and miscalibrates per-point p-values. This assumption is load-bearing for all downstream anomaly scores and refinement steps. The artificial-data demonstration does not expose the mismatch because the simulated densities are spatially uniform.

    Authors: We agree that the exact null distribution is hypergeometric (or a distance-ordered variant) because labels are drawn without replacement. The binomial model is used as a computationally convenient approximation that becomes accurate for large N, which is the regime of the intended applications. The ordering by distance further modifies the exact distribution, but the binomial still provides a conservative ranking of anomalies. We will add an explicit discussion of this approximation, its validity conditions, and a small-scale simulation comparing binomial versus hypergeometric p-values in the revised manuscript. revision: partial

  2. Referee: [Demonstrations] Demonstrations (all three scenarios): no quantitative performance metrics, error bars, false-positive rates, or comparisons against baselines are reported. Without these, it is impossible to verify that the detected anomalies correspond to the claimed localized density differences rather than artifacts of the binomial approximation.

    Authors: The three demonstrations are chosen to illustrate applicability across scientific domains rather than to serve as a benchmark study. We acknowledge that the absence of quantitative metrics, error bars, and baseline comparisons limits the ability to assess performance rigorously. In the revision we will add (i) a quantitative synthetic benchmark with known ground-truth anomalies, reporting precision-recall and false-positive rates with error bars over multiple realizations, and (ii) comparisons against standard baselines (local density ratio, LOF, and isolation forest) on the same data. revision: yes

Circularity Check

0 steps flagged

No circularity: binomial null is external reference, not self-derived

full rationale

The paper constructs the per-point anomaly score by encoding the ordered kNN membership sequence as a binary string and comparing the cumulative successes to a binomial(k, p_global) null, where p_global is the overall dataset fraction. This null model is an external statistical benchmark drawn from standard Bernoulli-trial assumptions rather than any fitted parameter or self-referential equation extracted from the local neighborhood itself. No derivation step reduces the anomaly score to a quantity already determined by the input data labels or by a self-citation chain; the subsequent refinement into anomaly sets is a deterministic post-processing rule that does not feed back into the score definition. The method therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method rests on the binomial distribution as a null model for neighbor membership and on the existence of a deterministic refinement procedure whose details are not supplied in the abstract; no free parameters are explicitly named but k and significance threshold are implicit choices.

free parameters (2)
  • k (number of nearest neighbors)
    Controls the scale of local neighborhood examined; must be chosen and affects sensitivity.
  • significance threshold for binomial test
    Determines which points are initially flagged; choice affects false-positive rate.
axioms (1)
  • domain assumption Under the null of no local anomaly, each neighbor's dataset membership is an independent Bernoulli trial with success probability equal to the global proportion of points from each dataset.
    Invoked when the binomial null model is applied to the membership sequence.

pith-pipeline@v0.9.0 · 5764 in / 1363 out tokens · 33185 ms · 2026-05-22T22:37:30.490493+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

    stat.ML 2026-05 unverdicted novelty 5.0

    An unsupervised method detects domain shifts via localized density anomaly search in feature space, attributes the shift to a minimal subspace, and extracts balanced subsets from two unlabeled datasets.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper

  1. [1]

    Borgwardt, Malte J

    Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research , 13(25):723–773, 2012

  2. [2]

    Testing for equal distributions in high dimension

    G´ abor J Sz´ ekely, Maria L Rizzo, et al. Testing for equal distributions in high dimension. InterStat, 5(16.10):1249–1272, 2004

  3. [3]

    Sz´ ekely and Maria L

    G´ abor J. Sz´ ekely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference , 143(8):1249–1272, 2013

  4. [4]

    Friedman and Lawrence C

    Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics , 7(4):697–717, 1979

  5. [5]

    Schilling

    Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association , 81(395):799–806, 1986

  6. [6]

    Anomaly detection with density estimation

    Benjamin Nachman and David Shih. Anomaly detection with density estimation. Phys. Rev. D, 101:075042, Apr 2020

  7. [7]

    Classifying anomalies through outer density estimation

    Anna Hallin, Joshua Isaacson, Gregor Kasieczka, Claudius Krause, Benjamin Nachman, Tobias Quadfasel, Matthias Schlaffer, David Shih, and Manuel Sommerhalder. Classifying anomalies through outer density estimation. Phys. Rev. D , 106:055006, Sep 2022

  8. [8]

    Trillos, and Marco Cuturi

    Aaditya Ramdas, Nestor G. Trillos, and Marco Cuturi. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2):47, 2015

  9. [9]

    Pqmass: Probabilistic assessment of the quality of generative models using probability mass estimation

    Pablo Lemos, Sammy Sharief, Nikolay Malkin, Laurence Perreault-Levasseur, and Yashar Heza- veh. Pqmass: Probabilistic assessment of the quality of generative models using probability mass estimation. arXiv preprint, 2024

  10. [10]

    Kauffmann, Robert A

    Lukas Ruff, Jens R. Kauffmann, Robert A. Vandermeulen, Gr´ egoire Montavon, Klaus-Robert M¨ uller, and Marius Kloft. A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE , 109(5):756–795, 2021

  11. [11]

    Automatic topography of high-dimensional data sets by non-parametric density peak clustering

    Maria d’Errico, Elena Facco, Alessandro Laio, and Alex Rodriguez. Automatic topography of high-dimensional data sets by non-parametric density peak clustering. Information Sciences , 560:476–492, 2021

  12. [12]

    Dadapy: Distance-based analysis of data- manifolds in python

    Aldo Glielmo, Iuri Macocco, Diego Doimo, Matteo Carli, Claudio Zeni, Romina Wild, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Dadapy: Distance-based analysis of data- manifolds in python. Patterns, page 100589, 2022

  13. [13]

    Fr¨ uhwirth and R

    R. Fr¨ uhwirth and R. K. Bock. Data analysis techniques for high-energy physics experiments , volume 11. Cambridge University Press, 2000

  14. [14]

    G. Cowan. Statistical Data Analysis . Oxford science publications. Clarendon Press, 1998

  15. [15]

    Lhc machine

    Lyndon Evans and Philip Bryant. Lhc machine. Journal of Instrumentation , 3(08):S08001, 2008

  16. [16]

    Peter W. Higgs. Broken symmetries and the masses of gauge bosons. Physical Review Letters , 13(16):508–509, 1964

  17. [17]

    Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc

    ATLAS Collaboration. Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc. Physics Letters B , 716(1):1–29, 2012

  18. [18]

    Observation of a new boson at a mass of 125 gev with the cms experiment at the lhc

    CMS Collaboration. Observation of a new boson at a mass of 125 gev with the cms experiment at the lhc. Physics Letters B , 716(1):30–61, 2012

  19. [19]

    The lhc olympics 2020 a commu- nity challenge for anomaly detection in high energy physics

    Gregor Kasieczka, Benjamin Nachman, David Shih, Oz Amram, Anders Andreassen, Kees Benk- endorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H Collins, Biwei Dai, Felipe F De Freitas, Barry M Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D A Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F Kamenik, Char...

  20. [20]

    The motivation and status of two-body resonance decays after the lhc run 2 and beyond

    Jeong Han Kim, Kyoungchul Kong, Benjamin Nachman, and Daniel Whiteson. The motivation and status of two-body resonance decays after the lhc run 2 and beyond. Journal of High Energy Physics, 2020(4), April 2020

  21. [21]

    Salam, and Gregory Soyez

    Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. The anti-k(t) jet clustering algorithm. Journal of High Energy Physics , 2008(04):063, 2008

  22. [22]

    Collins, Pablo Mart´ ın-Ramiro, Benjamin Nachman, and David Shih

    Jack H. Collins, Pablo Mart´ ın-Ramiro, Benjamin Nachman, and David Shih. Comparing weak- and unsupervised methods for resonant anomaly detection. European Physical Journal C , 81(7):617, 2021

  23. [23]

    Identifying boosted objects with n-subjettiness

    Jesse Thaler and Ken Van Tilburg. Identifying boosted objects with n-subjettiness. Journal of High Energy Physics , 2011(03):015, 2011

  24. [24]

    Dillon, and Jernej F

    Blaˇ z Bortolato, Aleks Smolkoviˇ c, Barry M. Dillon, and Jernej F. Kamenik. Bump hunting in latent space. Physical Review D , 105(11):115009, 2022

  25. [25]

    Modern machine learning for lhc physicists

    Tilman Plehn, Anja Butter, Barry Dillon, Theo Heimel, Claudius Krause, and Ramon Winter- halder. Modern machine learning for lhc physicists. arXiv, 2024

  26. [26]

    Asymptotic formulae for likelihood- based tests of new physics

    Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer Vitells. Asymptotic formulae for likelihood- based tests of new physics. Eur. Phys. J. C , 71:1554, 2011. [Erratum: Eur.Phys.J.C 73, 2501 (2013)]

  27. [27]

    Ncep-ncar reanalysis 1, 2024

    NOAA Physical Sciences Laboratory. Ncep-ncar reanalysis 1, 2024

  28. [28]

    Unsupervised detection of large-scale weather patterns in the northern hemisphere via markov state modelling: from blockings to teleconnections

    Sebastian Springer et al. Unsupervised detection of large-scale weather patterns in the northern hemisphere via markov state modelling: from blockings to teleconnections. npj Climate and Atmospheric Science, 7(105), 2024

  29. [29]

    The teacher’s corner: A comparison of binomial approximations to the hypergeometric distribution

    HD Brunk, James E Holstein, and Frederick Williams. The teacher’s corner: A comparison of binomial approximations to the hypergeometric distribution. The American Statistician , 22(1):24–26, 1968

  30. [30]

    M. Tanner. Shorter signals for improved signal to noise ratio, the influence of poisson distribu- tion. Journal of Analytical Atomic Spectrometry , 25:405–407, 2010

  31. [31]

    DS4ASTRO: Data Science methods for Multi-Messenger Astrophysics & Multi-Survey Cosmology

    Jason Gallicchio and Matthew D. Schwartz. Quark and Gluon Jet Substructure. JHEP, 04:090, 2013. 16 Acknowledgements AS was partially supported by the grant “DS4ASTRO: Data Science methods for Multi-Messenger Astrophysics & Multi-Survey Cosmology”, in the framework of the PRO3 ‘Programma Congiunto’ (DM n. 289/2021) of the Italian Ministry for University an...

  32. [32]

    Consequently, the anomalous points in the test set become more diluted, simultaneously resulting in an increased cardinality of the repˆ echage set,|Yanom α=1 |

    Impact of Increased Cardinality with Altered Local Density In this scenario, the background sample size |B| is increased by an order of magnitude. Consequently, the anomalous points in the test set become more diluted, simultaneously resulting in an increased cardinality of the repˆ echage set,|Yanom α=1 |. 0 200 400 600 800 1000 Contamination of 0 200 40...

  33. [33]

    global significance

    Impact of Increased Cardinality with Unaltered Local Density In the second scenario, the background is augmented by sampling additional points from a separate Gaussian mix- ture. This strategy preserves the local density contrast between the anomalies and the background, thereby maintaining stable detection performance despite the increased overall cardin...