Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface
Pith reviewed 2026-05-24 09:45 UTC · model grok-4.3
The pith
Adversarial training makes jet flavor tagging classifiers more robust to input distortions while keeping high performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adversarial attacks on input features probe the vulnerability of typical jet flavor tagging classifiers and serve as a model for systematic uncertainties. Adversarial training improves robustness to those attacks while maintaining high classification performance. Investigating the corresponding loss surface yields geometric interpretations of robustness that take input correlations into account.
What carries the argument
Adversarial training applied to jet flavor tagging networks, together with geometric analysis of the loss surface with respect to correlated input features.
If this is right
- Classifiers become less vulnerable to small adversarial changes in input features.
- Standard performance metrics on clean data remain high after training.
- Geometric properties of the loss surface explain how robustness accounts for feature correlations.
- The same defense can be applied to other deep-learning tasks that rely on many low-level inputs.
Where Pith is reading between the lines
- The loss-surface view might suggest new regularization methods that explicitly penalize sensitivity along correlated directions.
- Testing the same training procedure on uncertainties drawn from actual detector simulations would check whether the proxy remains useful outside the adversarial setting.
Load-bearing premise
Adversarial attacks on input features serve as an accurate model for systematic uncertainties affecting jet tagging performance.
What would settle it
A measurement showing that real experimental systematic uncertainties distort jet features differently from the tested adversarial perturbations, and that adversarial training produces no robustness gain under those actual distortions.
Figures
read the original abstract
In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier's vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that adversarial training improves the robustness of jet flavor tagging classifiers in high-energy physics while maintaining high performance on clean inputs. It positions gradient-based adversarial attacks (FGSM, PGD) on low-level jet features as a proxy for systematic uncertainties and uses loss-surface geometry to interpret robustness gains while accounting for input correlations.
Significance. If the central claim holds after addressing the proxy issue, the work would provide a practical defense strategy for ML-based taggers and a geometric lens on robustness that could generalize to other HEP classification tasks. The approach is timely given the increasing use of deep networks on low-level inputs, but its impact depends on demonstrating that robustness gains transfer to physically motivated uncertainties rather than remaining an in silico artifact.
major comments (1)
- [§1] §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.
minor comments (1)
- [Abstract] The abstract and introduction would benefit from a concise statement of the specific jet-tagging architecture, input features, and performance metrics (e.g., light-jet rejection at fixed b-efficiency) used throughout the study.
Simulated Author's Rebuttal
We thank the referee for the thorough review and for highlighting the importance of clearly delineating the scope of our claims. The central point concerns the interpretation of adversarial perturbations as a proxy for systematic uncertainties. We address this directly below and have revised the manuscript to strengthen the discussion of limitations while preserving the core technical contributions on loss-surface geometry and adversarial training.
read point-by-point responses
-
Referee: §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.
Authors: We agree that adversarial perturbations generated by FGSM/PGD are not guaranteed to coincide with the directions of real experimental systematics, and we do not claim such equivalence. The manuscript introduces adversarial attacks explicitly as a controlled proxy to probe classifier vulnerability on low-level inputs and to study the geometry of the loss surface under input correlations. The reported gains are demonstrated specifically against these gradient-based attacks; no statement is made that the adversarially trained models automatically improve performance under JES/JER, pile-up, or parton-shower variations. To make this scope explicit we have (i) rephrased the abstract and §1 to describe adversarial training as a defense against gradient-based perturbations rather than a general solution for all systematics, and (ii) added a dedicated limitations paragraph that notes the absence of direct comparison with dedicated systematic samples and suggests such a comparison as future work. These clarifications do not alter the geometric findings or the practical utility of the method within the setting that was actually studied. revision: partial
Circularity Check
No circularity detected in empirical adversarial training study
full rationale
The paper describes an empirical machine-learning procedure: adversarial attacks (PGD/FGSM) are applied to jet-tagging classifiers, adversarial training is performed, and loss-surface geometry is inspected. No equations, fitted parameters, or self-citations are shown that would make any reported robustness gain equivalent to the input data or to a prior result by the same authors. The modeling choice of treating adversarial perturbations as a proxy for systematics is an external assumption whose validity can be tested independently; it does not constitute a self-definitional or fitted-input reduction. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Albertsson K et al. 2018 J. Phys. Conf. Ser. 1085 022008 ( Preprint 1807.02876) URL https://doi.org/ 10.1088/1742-6596/1085/2/022008
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1742-6596/1085/2/022008 2018
-
[2]
CMS Collaboration 2018 JINST 13 P05011 ( Preprint 1712.07158 ) URL https://doi.org/10.1088/ 1748-0221/13/05/P05011
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Bols E, Kieseler J, Verzetti M, Stoye M and Stakia A 2020 JINST 15 P12012 (Preprint 2008.10519) URL https://doi.org/10.1088/1748-0221/15/12/P12012
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
-
[11]
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R 2014 Intriguing properties of neural networks ( Preprint 1312.6199)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[12]
Goodfellow I J, Shlens J and Szegedy C 2015 Explaining and Harnessing Adversarial Examples ( Preprint 1412.6572)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[13]
Stein A, Coubez X, Mondal S, Novak A and Schmidt A 2022 Comput. Softw. Big Sci. 6 15 ( Preprint 2203.13890) URL https://doi.org/10.1007/s41781-022-00087-1
- [14]
-
[15]
Ghosh A, Nachman B and Whiteson D 2021 Phys. Rev. D 104 056026 ( Preprint 2105.08742 ) URL https://doi.org/10.1103/PhysRevD.104.056026
- [16]
-
[17]
Li H, Xu Z, Taylor G, Studer C and Goldstein T 2018 Visualizing the Loss Landscape of Neural Nets NIPS
work page 2018
-
[18]
(Preprint 1712.09913)
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.