Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

Annika Stein

arxiv: 2303.14511 · v1 · pith:4QWFP4TUnew · submitted 2023-03-25 · ✦ hep-ex · cs.AI· cs.LG· hep-ph· physics.data-an

Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

Annika Stein This is my paper

Pith reviewed 2026-05-24 09:45 UTC · model grok-4.3

classification ✦ hep-ex cs.AIcs.LGhep-phphysics.data-an

keywords jet taggingadversarial trainingrobustnessloss surfaceflavor taggingdeep neural networkssystematic uncertainties

0 comments

The pith

Adversarial training makes jet flavor tagging classifiers more robust to input distortions while keeping high performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep neural networks used for jet flavor tagging in high-energy physics can be defended against small input perturbations by adversarial training. These perturbations act as a stand-in for systematic uncertainties that arise in real data. The central evidence comes from direct comparisons of model performance before and after adversarial training, paired with an analysis of the loss surface that incorporates correlations among input features. A reader would care because jet tagging is a core step in many physics analyses, and models that degrade under realistic distortions reduce the reliability of downstream results.

Core claim

Adversarial attacks on input features probe the vulnerability of typical jet flavor tagging classifiers and serve as a model for systematic uncertainties. Adversarial training improves robustness to those attacks while maintaining high classification performance. Investigating the corresponding loss surface yields geometric interpretations of robustness that take input correlations into account.

What carries the argument

Adversarial training applied to jet flavor tagging networks, together with geometric analysis of the loss surface with respect to correlated input features.

If this is right

Classifiers become less vulnerable to small adversarial changes in input features.
Standard performance metrics on clean data remain high after training.
Geometric properties of the loss surface explain how robustness accounts for feature correlations.
The same defense can be applied to other deep-learning tasks that rely on many low-level inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The loss-surface view might suggest new regularization methods that explicitly penalize sensitivity along correlated directions.
Testing the same training procedure on uncertainties drawn from actual detector simulations would check whether the proxy remains useful outside the adversarial setting.

Load-bearing premise

Adversarial attacks on input features serve as an accurate model for systematic uncertainties affecting jet tagging performance.

What would settle it

A measurement showing that real experimental systematic uncertainties distort jet features differently from the tested adversarial perturbations, and that adversarial training produces no robustness gain under those actual distortions.

Figures

Figures reproduced from arXiv: 2303.14511 by Annika Stein.

**Figure 1.** Figure 1: Different geometries of loss manifolds for nominal (bottom) and adversarial (top) training. momentum or other properties will not affect the respective network prediction error, or loss, for adversarial training. Nominal training on the other hand is not agnostic to changes in any of the two variables shown. While nominal training offers in general a lower network prediction error, adversarial training off… view at source ↗

**Figure 2.** Figure 2: Possible directions of adversarial attacks for different models. Starting from kinematic quantities which yield small loss, multiple arrows can be found for an FGSM attack imposed for adversarial training, while only one such arrow is constructed for nominal training. 3. Discussion This observation is a key element to understand why adversarial training may be preferred in settings with potentially disto… view at source ↗

read the original abstract

In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier's vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adversarial training boosts jet tagger robustness on paper, but the claim that attacks proxy real systematics looks shaky without stronger validation.

read the letter

The core takeaway here is that the work takes standard adversarial training, applies it to jet flavor tagging, and adds some loss-surface geometry to interpret the robustness gains. That combination is the main thing a reader gets. The abstract shows they train a classifier, run PGD/FGSM attacks on the inputs, then retrain with those perturbations and report better performance under attack while keeping nominal accuracy. The loss-surface part tries to visualize how correlations affect the geometry of the decision boundary. That targeted application plus the geometric angle is what is actually new relative to generic adversarial-robustness papers. It does a reasonable job of framing the problem for an HEP audience that already knows jet tagging but may not have seen the defense side. The experiments appear to be on standard simulated samples with the usual low-level inputs, which is the right setup for this kind of study. The soft spot is the central modeling choice: treating gradient-based input perturbations as a stand-in for detector and modeling systematics. Real jet-tagging uncertainties come from JES/JER shifts, pile-up, parton-shower variations, and calibration factors; those move many correlated features in ways that are not guaranteed to align with the directions an FGSM or PGD attack finds. If the paper only shows robustness to the artificial attacks and does not cross-check against actual systematic variations, the practical payoff for analyses stays limited. The loss-surface analysis is interesting but secondary once the proxy assumption is in doubt. This is the kind of paper that belongs in a specialized HEP ML workshop or journal rather than a broad venue. A reader already working on jet tagging or robustness in collider ML would get value from the concrete numbers and the geometric plots; someone outside that niche probably would not. It is coherent on its own terms and shows clear thinking about the literature it cites, so it deserves a serious referee rather than a desk reject, even if the referee will likely press on the systematics mapping.

Referee Report

1 major / 1 minor

Summary. The paper claims that adversarial training improves the robustness of jet flavor tagging classifiers in high-energy physics while maintaining high performance on clean inputs. It positions gradient-based adversarial attacks (FGSM, PGD) on low-level jet features as a proxy for systematic uncertainties and uses loss-surface geometry to interpret robustness gains while accounting for input correlations.

Significance. If the central claim holds after addressing the proxy issue, the work would provide a practical defense strategy for ML-based taggers and a geometric lens on robustness that could generalize to other HEP classification tasks. The approach is timely given the increasing use of deep networks on low-level inputs, but its impact depends on demonstrating that robustness gains transfer to physically motivated uncertainties rather than remaining an in silico artifact.

major comments (1)

[§1] §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.

minor comments (1)

[Abstract] The abstract and introduction would benefit from a concise statement of the specific jet-tagging architecture, input features, and performance metrics (e.g., light-jet rejection at fixed b-efficiency) used throughout the study.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the importance of clearly delineating the scope of our claims. The central point concerns the interpretation of adversarial perturbations as a proxy for systematic uncertainties. We address this directly below and have revised the manuscript to strengthen the discussion of limitations while preserving the core technical contributions on loss-surface geometry and adversarial training.

read point-by-point responses

Referee: §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.

Authors: We agree that adversarial perturbations generated by FGSM/PGD are not guaranteed to coincide with the directions of real experimental systematics, and we do not claim such equivalence. The manuscript introduces adversarial attacks explicitly as a controlled proxy to probe classifier vulnerability on low-level inputs and to study the geometry of the loss surface under input correlations. The reported gains are demonstrated specifically against these gradient-based attacks; no statement is made that the adversarially trained models automatically improve performance under JES/JER, pile-up, or parton-shower variations. To make this scope explicit we have (i) rephrased the abstract and §1 to describe adversarial training as a defense against gradient-based perturbations rather than a general solution for all systematics, and (ii) added a dedicated limitations paragraph that notes the absence of direct comparison with dedicated systematic samples and suggests such a comparison as future work. These clarifications do not alter the geometric findings or the practical utility of the method within the setting that was actually studied. revision: partial

Circularity Check

0 steps flagged

No circularity detected in empirical adversarial training study

full rationale

The paper describes an empirical machine-learning procedure: adversarial attacks (PGD/FGSM) are applied to jet-tagging classifiers, adversarial training is performed, and loss-surface geometry is inspected. No equations, fitted parameters, or self-citations are shown that would make any reported robustness gain equivalent to the input data or to a prior result by the same authors. The modeling choice of treating adversarial perturbations as a proxy for systematics is an external assumption whose validity can be tested independently; it does not constitute a self-definitional or fitted-input reduction. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; all technical details on models, losses, and data are absent.

pith-pipeline@v0.9.0 · 5672 in / 1028 out tokens · 23816 ms · 2026-05-24T09:45:19.349021+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 5 internal anchors

[1]

Albertsson K et al. 2018 J. Phys. Conf. Ser. 1085 022008 ( Preprint 1807.02876) URL https://doi.org/ 10.1088/1742-6596/1085/2/022008

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1742-6596/1085/2/022008 2018
[2]

CMS Collaboration 2018 JINST 13 P05011 ( Preprint 1712.07158 ) URL https://doi.org/10.1088/ 1748-0221/13/05/P05011

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Bols E, Kieseler J, Verzetti M, Stoye M and Stakia A 2020 JINST 15 P12012 (Preprint 2008.10519) URL https://doi.org/10.1088/1748-0221/15/12/P12012

work page doi:10.1088/1748-0221/15/12/p12012 2020
[4]

Qu H and Gouskos L 2020 Phys. Rev. D 101 056019 ( Preprint 1902.08570) URL https://doi.org/10. 1103/PhysRevD.101.056019

work page arXiv 2020
[5]

Qu H, Li C and Qian S 2022 Particle Transformer for Jet Tagging ( Preprint 2202.03772)

work page arXiv 2022
[6]

ATLAS Collaboration 2022 ATLAS ﬂavour-tagging algorithms for the LHC Run 2 pp collision dataset (Preprint 2211.16345)

work page arXiv 2022
[7]

Nachman B and Shimmin C 2019 AI Safety for High Energy Physics ( Preprint 1910.08606)

work page arXiv 2019
[8]

ATLAS Collaboration 2020 Monte Carlo to Monte Carlo scale factors for ﬂavour tagging eﬃciency calibration (Report ATL-PHYS-PUB-2020-009) URL https://cds.cern.ch/record/2718610

work page arXiv 2020
[9]

ATLAS Collaboration 2018 Calibration of light-ﬂavour jet b-tagging rates on ATLAS proton-proton collision data at √s = 13 TeV (Report ATLAS-CONF-2018-006) URL https://cds.cern.ch/record/2314418

work page arXiv 2018
[10]

CMS Collaboration 2022 JINST 17 P03014 ( Preprint 2111.03027 ) URL https://doi.org/10.1088/ 1748-0221/17/03/P03014

work page arXiv 2022
[11]

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R 2014 Intriguing properties of neural networks ( Preprint 1312.6199)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Goodfellow I J, Shlens J and Szegedy C 2015 Explaining and Harnessing Adversarial Examples ( Preprint 1412.6572)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Stein A, Coubez X, Mondal S, Novak A and Schmidt A 2022 Comput. Softw. Big Sci. 6 15 ( Preprint 2203.13890) URL https://doi.org/10.1007/s41781-022-00087-1

work page doi:10.1007/s41781-022-00087-1 2022
[14]

CMS Collaboration 2022 Adversarial training for b-tagging algorithms in CMS ( Report CMS DP-2022/049 ) URL https://cds.cern.ch/record/2839919/

work page arXiv 2022
[15]

Ghosh A, Nachman B and Whiteson D 2021 Phys. Rev. D 104 056026 ( Preprint 2105.08742 ) URL https://doi.org/10.1103/PhysRevD.104.056026

work page doi:10.1103/physrevd.104.056026 2021
[16]

Butter A, Dillon B M, Plehn T and Vogel L 2022 Performance versus Resilience in Modern Quark-Gluon Tagging (Preprint 2212.10493)

work page arXiv 2022
[17]

Li H, Xu Z, Taylor G, Studer C and Goldstein T 2018 Visualizing the Loss Landscape of Neural Nets NIPS

work page 2018
[18]

(Preprint 1712.09913)

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Albertsson K et al. 2018 J. Phys. Conf. Ser. 1085 022008 ( Preprint 1807.02876) URL https://doi.org/ 10.1088/1742-6596/1085/2/022008

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1742-6596/1085/2/022008 2018

[2] [2]

CMS Collaboration 2018 JINST 13 P05011 ( Preprint 1712.07158 ) URL https://doi.org/10.1088/ 1748-0221/13/05/P05011

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Bols E, Kieseler J, Verzetti M, Stoye M and Stakia A 2020 JINST 15 P12012 (Preprint 2008.10519) URL https://doi.org/10.1088/1748-0221/15/12/P12012

work page doi:10.1088/1748-0221/15/12/p12012 2020

[4] [4]

Qu H and Gouskos L 2020 Phys. Rev. D 101 056019 ( Preprint 1902.08570) URL https://doi.org/10. 1103/PhysRevD.101.056019

work page arXiv 2020

[5] [5]

Qu H, Li C and Qian S 2022 Particle Transformer for Jet Tagging ( Preprint 2202.03772)

work page arXiv 2022

[6] [6]

ATLAS Collaboration 2022 ATLAS ﬂavour-tagging algorithms for the LHC Run 2 pp collision dataset (Preprint 2211.16345)

work page arXiv 2022

[7] [7]

Nachman B and Shimmin C 2019 AI Safety for High Energy Physics ( Preprint 1910.08606)

work page arXiv 2019

[8] [8]

ATLAS Collaboration 2020 Monte Carlo to Monte Carlo scale factors for ﬂavour tagging eﬃciency calibration (Report ATL-PHYS-PUB-2020-009) URL https://cds.cern.ch/record/2718610

work page arXiv 2020

[9] [9]

ATLAS Collaboration 2018 Calibration of light-ﬂavour jet b-tagging rates on ATLAS proton-proton collision data at √s = 13 TeV (Report ATLAS-CONF-2018-006) URL https://cds.cern.ch/record/2314418

work page arXiv 2018

[10] [10]

CMS Collaboration 2022 JINST 17 P03014 ( Preprint 2111.03027 ) URL https://doi.org/10.1088/ 1748-0221/17/03/P03014

work page arXiv 2022

[11] [11]

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R 2014 Intriguing properties of neural networks ( Preprint 1312.6199)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Goodfellow I J, Shlens J and Szegedy C 2015 Explaining and Harnessing Adversarial Examples ( Preprint 1412.6572)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Stein A, Coubez X, Mondal S, Novak A and Schmidt A 2022 Comput. Softw. Big Sci. 6 15 ( Preprint 2203.13890) URL https://doi.org/10.1007/s41781-022-00087-1

work page doi:10.1007/s41781-022-00087-1 2022

[14] [14]

CMS Collaboration 2022 Adversarial training for b-tagging algorithms in CMS ( Report CMS DP-2022/049 ) URL https://cds.cern.ch/record/2839919/

work page arXiv 2022

[15] [15]

Ghosh A, Nachman B and Whiteson D 2021 Phys. Rev. D 104 056026 ( Preprint 2105.08742 ) URL https://doi.org/10.1103/PhysRevD.104.056026

work page doi:10.1103/physrevd.104.056026 2021

[16] [16]

Butter A, Dillon B M, Plehn T and Vogel L 2022 Performance versus Resilience in Modern Quark-Gluon Tagging (Preprint 2212.10493)

work page arXiv 2022

[17] [17]

Li H, Xu Z, Taylor G, Studer C and Goldstein T 2018 Visualizing the Loss Landscape of Neural Nets NIPS

work page 2018

[18] [18]

(Preprint 1712.09913)

work page internal anchor Pith review Pith/arXiv arXiv