arxiv: 2603.13970 · v2 · submitted 2026-03-14 · 💻 cs.LG · hep-ex

Recognition: 2 theorem links

· Lean Theorem

Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications

Philip Bechtle , Lucie Flek , Philipp Alexander Jung , Akbar Karimi , Timo Saala , Alexander Schmidt , Matthias Schott , Philipp Soldin

show 2 more authors

Christopher Wiebusch Ulrich Willemsen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:17 UTC · model grok-4.3

classification 💻 cs.LG hep-ex

keywords adversarial attackmachine learninghigh energy physicssimulation uncertaintymodel robustnessCONSERVAttacksystematic uncertainties

0 comments

The pith

A new adversarial attack called CONSERVAttack can exploit undetected deviations between simulated and real data in high energy physics machine learning applications while remaining within uncertainty bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard tests in high energy physics for systematic uncertainties, marginal distributions, and linear feature correlations between data and simulation leave open the possibility of undetected deviations. The paper introduces the CONSERVAttack to generate adversarial perturbations that stay consistent with uncertainty bounds, evade those checks, and still fool the model. If such vulnerabilities exist, machine learning outputs in physics experiments cannot be trusted without additional robustness measures. The authors also outline mitigation strategies to address the gaps.

Core claim

The CONSERVAttack is designed to exploit the remaining space of hypothetical deviations between simulation and data after physically motivated systematic uncertainty tests, marginal distribution comparisons, and linear feature correlation checks. The resulting adversarial perturbations are consistent within the uncertainty bounds, evading standard validation checks, while successfully fooling the underlying model. The authors argue that robustness to adversarial effects must be considered when interpreting results from deep learning in particle physics and propose strategies to mitigate such vulnerabilities.

What carries the argument

The CONSERVAttack, an adversarial method that generates perturbations consistent within uncertainty bounds to evade standard validation while fooling the model.

If this is right

Robustness to adversarial effects must be considered when interpreting results from deep learning in particle physics.
Standard validation checks based on physical motivation and linear correlations are insufficient to guarantee no exploitable deviations.
Machine learning models in physics may produce unreliable outputs if such undetected deviations exist.
Mitigation strategies can be developed to reduce vulnerabilities to attacks that respect uncertainty bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same vulnerability could appear in other domains that use machine learning with simulated data, such as climate science or astrophysics.
Tests for higher-order statistical correlations beyond linear ones might close some of the remaining deviation space.
Adversarial training during model development could serve as a practical defense in physics analysis pipelines.

Load-bearing premise

Hypothetical deviations between simulation and data exist within uncertainty bounds that are not captured by physically motivated tests, marginal distributions, or linear feature correlations.

What would settle it

Apply CONSERVAttack to a real high energy physics machine learning model, generate perturbations, and test whether they fool the model while passing all standard validation checks and remaining within uncertainty bounds; inability to produce such fooling perturbations would falsify the claim.

read the original abstract

In High Energy Physics, as in many other fields of science, the application of machine learning techniques has been crucial in advancing our understanding of fundamental phenomena. Increasingly, deep learning models are applied to analyze both simulated and experimental data. In most experiments, a rigorous regime of testing for physically motivated systematic uncertainties is in place. The numerical evaluation of these tests for differences between the data on the one side and simulations on the other side quantifies the effect of potential sources of mismodelling on the machine learning output. In addition, thorough comparisons of marginal distributions and (linear) feature correlations between data and simulation in "control regions" are applied. However, the guidance by physical motivation, and the need to constrain comparisons to specific regions, does not guarantee that all possible sources of deviations have been accounted for. We therefore propose a new adversarial attack - the CONSERVAttack - designed to exploit the remaining space of hypothetical deviations between simulation and data after the above mentioned tests. The resulting adversarial perturbations are consistent within the uncertainty bounds - evading standard validation checks - while successfully fooling the underlying model. We further propose strategies to mitigate such vulnerabilities and argue that robustness to adversarial effects must be considered when interpreting results from deep learning in particle physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes CONSERVAttack as a way to probe ML models in HEP for deviations inside uncertainty bounds that standard checks miss, but supplies no construction, algorithm, or results to show it works.

read the letter

The core idea is straightforward: after the usual physically motivated tests, marginal checks, and control-region correlations, there might still be room for adversarial shifts that stay within quoted uncertainties yet change the ML output. The paper names this CONSERVAttack and argues it should be considered when interpreting deep-learning results in particle physics. That observation is reasonable on its face and matches concerns some groups already have about simulation fidelity in jet tagging or event classification.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes CONSERVAttack, a new adversarial attack for deep learning models in High Energy Physics. It argues that standard validation procedures—physically motivated systematic uncertainty tests, marginal distribution comparisons, and linear feature correlations in control regions—do not exhaust all possible deviations between simulation and data. The attack is claimed to generate perturbations that remain inside stated uncertainty bounds (thereby evading the listed checks) while altering the model's output. Mitigation strategies are outlined and the authors recommend that adversarial robustness be considered when interpreting ML results in particle physics.

Significance. If a concrete, reproducible construction of CONSERVAttack were supplied together with quantitative results on a representative HEP task (e.g., jet tagging or event classification) showing that the perturbations pass the cited validation tests yet change the model prediction, the work would usefully highlight gaps in current shape-based validation practices and motivate more comprehensive robustness checks. The absence of any algorithm, pseudocode, or empirical demonstration in the manuscript, however, leaves the central claim unsubstantiated at present.

major comments (2)

[Abstract] Abstract: the assertion that CONSERVAttack produces perturbations 'consistent within the uncertainty bounds—evading standard validation checks—while successfully fooling the underlying model' is presented without any explicit generation procedure, pseudocode, or quantitative evidence on a concrete task. This directly undermines the central claim that such perturbations exist and can be systematically found.
No section or equation supplies the construction that would satisfy the weakest assumption: perturbations that pass physically motivated tests, marginal distributions, and linear correlations in control regions yet still change the ML output. Without this, the attack remains an invented entity rather than a demonstrated method.

minor comments (1)

The title 'Shapes are not enough' is evocative but the manuscript would benefit from an explicit definition of 'shapes' in the opening paragraphs to clarify its relation to the marginal and correlation tests discussed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The points raised correctly identify that the current version presents CONSERVAttack primarily as a conceptual proposal without supplying the explicit construction, pseudocode, or quantitative demonstration needed to substantiate the central claims. We will revise the manuscript to include these elements.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that CONSERVAttack produces perturbations 'consistent within the uncertainty bounds—evading standard validation checks—while successfully fooling the underlying model' is presented without any explicit generation procedure, pseudocode, or quantitative evidence on a concrete task. This directly undermines the central claim that such perturbations exist and can be systematically found.

Authors: We agree that the abstract states the existence of such perturbations without providing the supporting procedure or evidence. The manuscript was written as a conceptual contribution to highlight potential gaps in existing validation practices. In the revision we will rewrite the abstract to describe CONSERVAttack as a proposed method, add a dedicated section containing the explicit optimization procedure (including how perturbations are constrained to lie inside the stated uncertainty bounds), pseudocode, and quantitative results on a representative HEP task such as jet tagging or event classification. revision: yes
Referee: [—] No section or equation supplies the construction that would satisfy the weakest assumption: perturbations that pass physically motivated tests, marginal distributions, and linear correlations in control regions yet still change the ML output. Without this, the attack remains an invented entity rather than a demonstrated method.

Authors: This observation is accurate. The present manuscript does not contain the required construction or verification. We will add the mathematical formulation of the attack, the algorithm that generates perturbations respecting the uncertainty bounds while altering the model output, and explicit checks confirming that the resulting perturbations pass the physically motivated systematic tests, marginal distribution comparisons, and linear feature correlations in control regions. These additions will be accompanied by numerical experiments on a concrete HEP classification task. revision: yes

Circularity Check

0 steps flagged

No circularity: CONSERVAttack is a methodological proposal without self-referential reduction

full rationale

The paper introduces CONSERVAttack as a new adversarial method to probe hypothetical deviations between simulation and data that remain after standard physically motivated tests, marginal distributions, and linear correlations. No equations, fitted parameters, or self-citations appear in the provided text that would make the attack's construction equivalent to its inputs by definition. The central claim rests on the existence of such perturbations and the attack's ability to generate them while evading checks, presented as an independent proposal rather than a renaming or redefinition of existing quantities. This is a standard non-circular methodological contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that current HEP validation procedures leave exploitable gaps inside uncertainty bounds and that an attack can be constructed to target those gaps without violating them.

axioms (1)

domain assumption Current physically motivated tests, marginal distributions, and linear correlations do not exhaust all possible deviations between simulation and data that lie inside uncertainty bounds.
This premise is required for the attack to have a non-empty target space.

invented entities (1)

CONSERVAttack no independent evidence
purpose: Adversarial perturbation generator constrained to uncertainty bounds
Newly proposed technique whose effectiveness is asserted but not demonstrated in the abstract.

pith-pipeline@v0.9.0 · 5553 in / 1276 out tokens · 51638 ms · 2026-05-15T11:17:58.072217+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The attack proceeds via an iterative, gradient-based, brute-force–style search over candidate perturbations... L:=αD_JS +β∆_F
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

preserving marginal feature distributions and preserving the inter-feature correlations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uncovering Hidden Systematics in Neural Network Models for High Energy Physics
cs.LG 2026-05 unverdicted novelty 6.0

Neural networks for HEP tasks can be fooled at significant rates by subtle perturbations inside uncertainty envelopes, revealing hidden systematics not captured by conventional methods.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper · 8 internal anchors

[1]

Brief- ings in Bioinformatics19(6), 1236–1246 (2017) https://doi.org/10.1093/bib/bbx044 https://academic.oup.com/bib/article- pdf/19/6/1236/27119191/bbx044.pdf

Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review, opportunities and challenges. Brief- ings in Bioinformatics19(6), 1236–1246 (2017) https://doi.org/10.1093/bib/bbx044 https://academic.oup.com/bib/article- pdf/19/6/1236/27119191/bbx044.pdf

work page doi:10.1093/bib/bbx044 2017
[2]

IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2024) https://doi.org/10.1109/TIV.2023.3318070

Chib, P.S., Singh, P.: Recent advancements in end-to-end autonomous driving using deep learning: A survey. IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2024) https://doi.org/10.1109/TIV.2023.3318070

work page doi:10.1109/tiv.2023.3318070 2024
[3]

A Survey of Large Language Models

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., Wen, J.-R.: A Survey of Large Language Models (2025). https:// arxiv.org/abs/2303.18223

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Cowan, G., Cranmer, K., Gross, E., Vitells, O.: Asymptotic formulae for likelihood-based tests of new physics. Eur. Phys. J. C71, 1554 (2011) https: //doi.org/10.1140/epjc/s10052-011-1554-0 arXiv:1007.1727 [physics.data-an]. [Erratum: Eur.Phys.J.C 73, 2501 (2013)]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1140/epjc/s10052-011-1554-0 2011
[5]

Aad, G.,et al.: Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys. Lett. B 716, 1–29 (2012) https://doi.org/10.1016/ j.physletb.2012.08.020 arXiv:1207.7214 [hep- ex]

work page internal anchor Pith review Pith/arXiv arXiv 2012
[6]

Chatrchyan, S.,et al.: Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC. Phys. Lett. B 716, 30–61 (2012) https://doi.org/10.1016/ 21 j.physletb.2012.08.021 arXiv:1207.7235 [hep- ex]

work page internal anchor Pith review Pith/arXiv arXiv 2012
[7]

IEEE Access12, 61113–61136 (2024) https: //doi.org/10.1109/ACCESS.2024.3395118

Costa, J.C., Roxo, T., Proen¸ ca, H., In´ acio, P.R.M.: How deep learning sees the world: A survey on adversarial attacks & defenses. IEEE Access12, 61113–61136 (2024) https: //doi.org/10.1109/ACCESS.2024.3395118

work page doi:10.1109/access.2024.3395118 2024
[8]

Explaining and Harnessing Adversarial Examples

Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and Harnessing Adversarial Examples (2015). https://arxiv.org/abs/ 1412.6572

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Towards Deep Learning Models Resistant to Adversarial Attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards Deep Learning Mod- els Resistant to Adversarial Attacks (2019). https://arxiv.org/abs/1706.06083

work page internal anchor Pith review Pith/arXiv arXiv 2019
[10]

https://arxiv.org/abs/ 2101.08030

Cartella, F., Anunciacao, O., Funabiki, Y., Yamaguchi, D., Akishita, T., Elshocht, O.: Adversarial Attacks for Tabular Data: Application to Fraud Detection and Imbal- anced Data (2021). https://arxiv.org/abs/ 2101.08030

work page arXiv 2021
[11]

https://arxiv.org/abs/1901

Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey (2019). https://arxiv.org/abs/1901. 06796

work page 2019
[12]

Computing and Software for Big Science9(1), 19 (2025) https://doi.org/ 10.1007/s41781-025-00148-1

Flek, L., Jung, P.A., Karimi, A., Saala, T., Schmidt, A., Schott, M., Soldin, P., Wiebusch, C.H.: Enforcing fundamental rela- tions via adversarial attacks on input parame- ter correlations. Computing and Software for Big Science9(1), 19 (2025) https://doi.org/ 10.1007/s41781-025-00148-1

work page doi:10.1007/s41781-025-00148-1 2025
[13]

https:// arxiv.org/abs/2511.01352

Flek, L., Janik, O., Jung, P.A., Karimi, A., Saala, T., Schmidt, A., Schott, M., Soldin, P., Thiesmeyer, M., Wiebusch, C., Willem- sen, U.: MiniFool – Physics-Constraint- Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks (2025). https:// arxiv.org/abs/2511.01352

work page arXiv 2025
[14]

Open Journal of Mathematical Opti- mization3, 1–85 (2022) https://doi.org/10

Rahimian, H., Mehrotra, S.: Frameworks and results in distributionally robust optimiza- tion. Open Journal of Mathematical Opti- mization3, 1–85 (2022) https://doi.org/10. 5802/ojmo.15

work page 2022
[15]

https:// kaggle.com/competitions/higgs-boson

Kegl, B., CecileGermain, ChallengeAdmin, ClaireAdam, Rousseau, D., Djabbz, fra- dav, Cowan, G., Isabelle, joycenv: Higgs Boson Machine Learning Challenge. https:// kaggle.com/competitions/higgs-boson. Kag- gle (2014)

work page 2014
[16]

https://opendata.cern.ch/ Accessed 2026-02- 20

CERN: CERN Open Data Portal (2026). https://opendata.cern.ch/ Accessed 2026-02- 20

work page 2026
[17]

CERN Open Data Portal (2017)

CMS collaboration: Simulated dataset TTJets FullLeptMGDecays TuneP11TeV 8TeV- madgraph-tauola in AODSIM format for 2012 collision data. CERN Open Data Portal (2017). https://doi.org/10.7483/ OPENDATA.CMS.7RZ3.0BXP

work page 2012
[18]

CERN Open Data Portal (2017)

CMS collaboration: Simulated dataset WWJetsTo2L2Nu TuneZ2star 8TeV- madgraph-tauola in AODSIM format for 2012 collision data. CERN Open Data Portal (2017). https://doi.org/10.7483/ OPENDATA.CMS.V2C6.O1P4

work page 2012
[19]

SciPost Phys

Kasieczka, G., Plehn, T., Butter, A., Cran- mer, K., Debnath, D., Dillon, B.M., Fair- bairn, M., Faroughy, D.A., Fedorko, W., Gay, C., Gouskos, L., Kamenik, J.F., Komiske, P.T., Leiss, S., Lister, A., Macaluso, S., Metodiev, E.M., Moore, L., Nachman, B., Nordstr¨ om, K., Pearkes, J., Qu, H., Rath, Y., Rieger, M., Shih, D., Thomp- son, J.M., Varma, S.: The...

work page 2019
[20]

In: International Con- ference on Learning Representations (2019)

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: International Con- ference on Learning Representations (2019). https://openreview.net/forum?id=SyxAb30cY7

work page 2019
[21]

On Detecting Adversarial Perturbations

Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On Detecting Adversarial Perturbations (2017). https://arxiv.org/abs/ 1702.04267 22

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

https: //arxiv.org/abs/2108.00401

Akhtar, N., Mian, A., Kardan, N., Shah, M.: Advances in adversarial attacks and defenses in computer vision: A survey (2021). https: //arxiv.org/abs/2108.00401

work page arXiv 2021
[23]

Practical Black-Box Attacks against Machine Learning

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical Black-Box Attacks against Machine Learning (2017). https://arxiv.org/abs/1602.02697

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

CERN Open Data Portal (2017)

CMS collaboration: SingleMu primary dataset in AOD format from Run of 2012. CERN Open Data Portal (2017). https://doi. org/10.7483/OPENDATA.CMS.9A4E.7SIR

work page doi:10.7483/opendata.cms.9a4e.7sir 2012
[25]

Sz´ ekely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by cor- relation of distances. The Annals of Statis- tics35(6) (2007) https://doi.org/10.1214/ 009053607000000505 Appendix A Attack Optimization A.1 Jensen Shannon Distance Calculation When evaluating the effect of small perturbations on feature histograms (as required for Jense...

work page 2007