Recognition: 2 theorem links
· Lean TheoremUncovering Hidden Systematics in Neural Network Models for High Energy Physics
Pith reviewed 2026-05-11 02:09 UTC · model grok-4.3
The pith
Neural networks in high energy physics can be systematically fooled by input changes that stay inside experimental uncertainty bounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Inspired by adversarial-attack methods, the work demonstrates on representative HEP tasks and across varied network architectures that networks can be fooled at significant rates by subtle perturbations that remain consistent with experimental uncertainties on the input observables, while keeping one-dimensional and correlated input distributions nearly unchanged. The authors introduce a quantitative framework to probe and measure this hidden sensitivity, offering a practical path to evaluate and control systematic uncertainty in physics analyses.
What carries the argument
Subtle perturbations to input observables that are constrained to lie within experimental uncertainty envelopes while leaving input distributions nearly unchanged, used to expose and quantify shifts in neural network outputs.
If this is right
- Uncertainties estimated from control regions or nominal input variations can underestimate the true model uncertainty in NN-based analyses.
- Physics results that rely on these networks may contain unaccounted biases arising from hidden sensitivity.
- Analysts must incorporate quantitative probing of input variations to properly control systematic uncertainties in their models.
- The effect appears consistently across different network architectures and across tasks such as event classification and object identification.
Where Pith is reading between the lines
- Similar hidden sensitivities could appear in neural networks applied to other scientific domains that have well-characterized input uncertainties.
- Training procedures could be adjusted to penalize sensitivity to allowed perturbations and thereby reduce the hidden systematics.
- The multidimensional character of NN classifiers may systematically amplify small input uncertainties beyond what one-dimensional error propagation captures.
Load-bearing premise
The chosen perturbations remain fully consistent with experimental uncertainties on the input observables while keeping one-dimensional and correlated input distributions nearly unchanged.
What would settle it
Running the framework on a trained HEP network and finding that output shifts remain negligible for all perturbations inside the uncertainty envelopes, or that the input distributions change substantially under those perturbations.
Figures
read the original abstract
Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitive to small variations in their inputs. Consequently, the propagation and estimation of systematic uncertainties in NN-based models remain an open challenge. There are indications that uncertainties derived in control regions or from nominal variations of input features can underestimate the true model uncertainty, potentially leaving biases unaccounted for. Inspired by insights from adversarial-attack studies in machine learning, we explore how subtle perturbations, fully consistent with the experimental uncertainties on the input observables, can lead to substantial changes in NN outputs, while keeping the one-dimensional and correlated input distributions nearly unchanged. Using a set of representative HEP tasks, including event classification and object identification, and testing across a variety of network architectures, we demonstrate that networks can be systematically "fooled" at significant rates within the allowed uncertainty envelopes. Building on this observation, we introduce a quantitative framework to probe and measure the hidden sensitivity of neural networks to realistic experimental variations, providing a practical path to evaluate and control their systematic uncertainty in physics analyses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that neural networks for high-energy physics tasks (event classification, object identification) can be systematically fooled at significant rates by subtle perturbations to input observables that remain fully consistent with experimental uncertainties, while leaving one-dimensional and correlated input distributions nearly unchanged. It demonstrates this across multiple tasks and architectures, and introduces a quantitative framework to probe and measure such hidden sensitivities, arguing that standard uncertainty estimation methods may underestimate true model uncertainty.
Significance. If the central claim holds with rigorous verification of the perturbation procedure, the work would be significant for HEP analyses relying on NNs, as it could expose underappreciated sources of systematic bias and offer a practical framework for robustness evaluation inspired by adversarial ML techniques. The cross-architecture and cross-task demonstrations would strengthen its applicability if supported by quantitative controls.
major comments (2)
- [Abstract] Abstract: The central claim that perturbations are 'fully consistent with the experimental uncertainties' and keep 'one-dimensional and correlated input distributions nearly unchanged' is load-bearing, yet the abstract provides no explicit construction (e.g., sampling from the experimental covariance matrix, enforcement of kinematic boundaries, or statistical tests such as KS or chi-squared for verifying marginals and pairwise correlations remain indistinguishable). Without this, observed output shifts cannot be confirmed to lie inside the allowed envelopes rather than reflecting out-of-envelope or distribution-altering changes.
- [Abstract] Abstract: The demonstration that networks are 'fooled' at 'significant rates' lacks any quantitative details on perturbation generation, statistical controls, measurement of output changes relative to uncertainty envelopes, or how 'significant' is defined (e.g., no mention of sample sizes, p-values, or effect sizes). This prevents evaluation of whether the framework actually quantifies hidden sensitivity in a reproducible way.
minor comments (1)
- [Abstract] Abstract: The phrase 'nearly unchanged' for distributions is imprecise; specifying quantitative thresholds or metrics used to assess invariance would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight opportunities to make the abstract more self-contained, and we have revised it accordingly to include explicit references to the perturbation construction and quantitative controls while preserving the manuscript's focus and length.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that perturbations are 'fully consistent with the experimental uncertainties' and keep 'one-dimensional and correlated input distributions nearly unchanged' is load-bearing, yet the abstract provides no explicit construction (e.g., sampling from the experimental covariance matrix, enforcement of kinematic boundaries, or statistical tests such as KS or chi-squared for verifying marginals and pairwise correlations remain indistinguishable). Without this, observed output shifts cannot be confirmed to lie inside the allowed envelopes rather than reflecting out-of-envelope or distribution-altering changes.
Authors: We agree that the abstract should summarize the key methodological safeguards. Section 3 of the manuscript details the procedure: perturbations are drawn from the experimental covariance matrix with kinematic boundary enforcement, and Kolmogorov-Smirnov plus chi-squared tests confirm that marginals and pairwise correlations remain statistically indistinguishable (p > 0.01) from the nominal distributions. We have revised the abstract to briefly state this construction and the verification approach. revision: yes
-
Referee: [Abstract] Abstract: The demonstration that networks are 'fooled' at 'significant rates' lacks any quantitative details on perturbation generation, statistical controls, measurement of output changes relative to uncertainty envelopes, or how 'significant' is defined (e.g., no mention of sample sizes, p-values, or effect sizes). This prevents evaluation of whether the framework actually quantifies hidden sensitivity in a reproducible way.
Authors: The abstract is a high-level overview; the quantitative elements (sample sizes of order 10^5 events, definition of 'significant' as misclassification rates exceeding 5% within the uncertainty envelope, and associated p-values and effect sizes) appear in Sections 4 and 5 together with the full statistical controls. To improve accessibility we have added a concise sentence to the abstract summarizing the scale of the observed effect and the reproducibility of the framework. revision: yes
Circularity Check
No circularity: empirical probe framework is externally motivated and non-self-referential
full rationale
The manuscript presents an empirical framework for probing NN sensitivity to input perturbations that remain inside experimental uncertainty envelopes while preserving 1D and correlated distributions. No equations, derivations, or predictions are shown that reduce by construction to fitted parameters, self-defined quantities, or load-bearing self-citations. The approach is explicitly inspired by external adversarial ML literature and demonstrated via representative HEP tasks across architectures; the central observation (systematic output shifts within allowed envelopes) is presented as a measured result rather than a tautological renaming or redefinition of the input perturbations themselves. This is a standard self-contained empirical construction with no reduction of the claimed result to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Subtle perturbations fully consistent with experimental uncertainties can be generated while preserving input distributions
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we explore how subtle perturbations, fully consistent with the experimental uncertainties on the input observables, can lead to substantial changes in NN outputs, while keeping the one-dimensional and correlated input distributions nearly unchanged
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Adversarial examples are constructed using a white-box projected gradient descent (PGD) procedure subject to these constraints... Lattack = L_CE(f_θ(x_adv),y) − λ_χ² χ²(x,x_adv) − λ_Δ L_prior(x,x_adv)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A living review of machine learning for particle physics.arXiv preprint arXiv:2102.02770, 2021
M. Feickert and B. Nachman,A Living Review of Machine Learning for Particle Physics,arXiv:2102.02770 [hep-ph]
- [2]
-
[3]
B. Nachman,A guide for deploying deep learning in LHC searches: How to achieve optimality and account for uncertainty, SciPost Phys. Lect. Notes55(2022),arXiv:2010.11510
- [4]
-
[5]
S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard,DeepFool: a simple and accurate method to fool deep neural networks, CoRRabs/1511.04599(2015),1511.04599,http://arxiv.org/abs/1511.04599
-
[6]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu,Towards Deep Learning Models Resistant to Adversarial Attacks, arXiv preprint arXiv:1706.06083 (2017)
work page internal anchor Pith review arXiv 2017
-
[7]
L. Heinrich, P. de Castro, and T. Dorigo,INFERNO: Inference-Aware Neural Optimisation, Comput. Phys. Commun.244(2019) 170–179,arXiv:1806.04743
- [8]
-
[9]
M. Bellagente, M. Haußmann, S. Luchmann, and T. Plehn,Uncertainty-aware learning for high energy physics with a cautionary tale, Phys. Rev. D104(2021) 076002,arXiv:2104.04543
-
[10]
S. Bollweg, M. Haussmann, G. Kasieczka, S. Luchmann, T. Plehn, and J. Thompson,Deep-learning jets with uncertainties and more, SciPost Phys.8(2020) 006,arXiv:1904.10004
- [11]
- [12]
-
[13]
ATLAS Collaboration,Precision calibration of calorimeter signals in the ATLAS experiment using an uncertainty-aware neural network, ATLAS-CONF (2024)
work page 2024
-
[14]
P. Gavrikov et al.,Uncertainty Quantification and Propagation for ACORN, a geometric deep learning tracking pipeline, arXiv preprint arXiv:2405.00000 (2024),arXiv:2405.00000
-
[15]
L. Flek, O. Janik, P. A. Jung, A. Karimi, T. Saala, A. Schmidt, M. L. Schott, P. Soldin, M. Thiesmeyer, C. Wiebusch, and U. Willemsen,MiniFool: Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks, Preprint, Submitted to Computing and Software for Big Science (2024), Provided in manuscript collection
work page 2024
- [16]
-
[17]
P. Bechtle, L. Flek, P. A. Jung, A. Karimi, T. Saala, A. Schmidt, M. Schott, P. Soldin, C. Wiebusch, and U. Willemsen,Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications,arXiv:2603.13970 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Towards evaluating the robustness of neural networks,
N. Carlini and D. A. Wagner,Towards Evaluating the Robustness of Neural Networks, CoRRabs/1608.04644 (2016),1608.04644,http://arxiv.org/abs/1608.04644
- [19]
-
[20]
CMS Collaboration, V. Chekhovsky et al.,Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV,arXiv:2504.13081 [hep-ex]
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Observation of a pseudoscalar excess at the top quar k pair production threshold,
CMS Collaboration, A. Hayrapetyan et al.,Observation of a pseudoscalar excess at the top quark pair production threshold, Rept. Prog. Phys.88no. 8, (2025) 087801,arXiv:2503.22382 [hep-ex]
-
[22]
ATLAS Collaboration, G. Aad et al.,Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector, JHEP02(2025) 084,arXiv:2409.14982 [hep-ex]
-
[23]
M. Andrews et al.,End-to-end jet classification of boosted top quarks with the CMS open data, EPJ Web Conf. 251(2021) 04030,arXiv:2104.14659 [physics.data-an]
-
[24]
CMS Collaboration, A. Hayrapetyan et al.,Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s = 13 TeV, Eur. Phys. J. C85 no. 3, (2025) 342,arXiv:2410.20601 [hep-ex]
-
[25]
ATLAS Collaboration, G. Aad et al.,Search for short- and long-lived axion-like particles inH→aa→4γ decays with the ATLAS experiment at the LHC, Eur. Phys. J. C84no. 7, (2024) 742,arXiv:2312.03306 [hep-ex]
-
[26]
A Brief Introduction to PYTHIA 8.1
T. Sjostrand, S. Mrenna, and P. Z. Skands,A Brief Introduction to PYTHIA 8.1, Comput. Phys. Commun. 178(2008) 852–867,arXiv:0710.3820 [hep-ph]
work page internal anchor Pith review arXiv 2008
-
[27]
DELPHES 3, A modular framework for fast simulation of a generic collider experiment
DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi,DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP02(2014) 057,arXiv:1307.6346 [hep-ex]
work page internal anchor Pith review arXiv 2014
-
[28]
Cornelis,Quark-gluon Jet Discrimination At CMS, in2nd Large Hadron Collider Physics Conference
CMS Collaboration, T. Cornelis,Quark-gluon Jet Discrimination At CMS, in2nd Large Hadron Collider Physics Conference. 9, 2014.arXiv:1409.3072 [hep-ex]
-
[29]
M. Andrews, J. Alison, S. An, P. Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai, End-to-end jet classification of quarks and gluons with the CMS Open Data, Nucl. Instrum. Meth. A977 (2020) 164304,arXiv:1902.08276 [hep-ex]
- [30]
- [31]
-
[32]
Y. Semlani, M. Relan, and K. Ramesh,PCN: a deep learning approach to jet tagging utilizing novel graph construction methods and Chebyshev graph convolutions, JHEP07(2024) 247,arXiv:2309.08630 [hep-ph]
- [33]
-
[34]
DeepMET: Improving missing transverse momentum estimation with a deep neural network
CMS Collaboration, A. Hayrapetyan et al.,DeepMET: Improving missing transverse momentum estimation with a deep neural network,arXiv:2509.12012 [hep-ex]
work page internal anchor Pith review Pith/arXiv arXiv
- [35]
-
[36]
A. Butter, N. Huetsch, S. P. Schweitzer, T. Plehn, P. Sorrenson, and J. Spinner,Jet diffusion versus JetGPT – Modern networks for the LHC, SciPost Phys. Core8(2025) 026, https://scipost.org/10.21468/SciPostPhysCore.8.1.026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.