pith. sign in

arxiv: 2303.14511 · v1 · pith:4QWFP4TUnew · submitted 2023-03-25 · ✦ hep-ex · cs.AI· cs.LG· hep-ph· physics.data-an

Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

Pith reviewed 2026-05-24 09:45 UTC · model grok-4.3

classification ✦ hep-ex cs.AIcs.LGhep-phphysics.data-an
keywords jet taggingadversarial trainingrobustnessloss surfaceflavor taggingdeep neural networkssystematic uncertainties
0
0 comments X

The pith

Adversarial training makes jet flavor tagging classifiers more robust to input distortions while keeping high performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep neural networks used for jet flavor tagging in high-energy physics can be defended against small input perturbations by adversarial training. These perturbations act as a stand-in for systematic uncertainties that arise in real data. The central evidence comes from direct comparisons of model performance before and after adversarial training, paired with an analysis of the loss surface that incorporates correlations among input features. A reader would care because jet tagging is a core step in many physics analyses, and models that degrade under realistic distortions reduce the reliability of downstream results.

Core claim

Adversarial attacks on input features probe the vulnerability of typical jet flavor tagging classifiers and serve as a model for systematic uncertainties. Adversarial training improves robustness to those attacks while maintaining high classification performance. Investigating the corresponding loss surface yields geometric interpretations of robustness that take input correlations into account.

What carries the argument

Adversarial training applied to jet flavor tagging networks, together with geometric analysis of the loss surface with respect to correlated input features.

If this is right

  • Classifiers become less vulnerable to small adversarial changes in input features.
  • Standard performance metrics on clean data remain high after training.
  • Geometric properties of the loss surface explain how robustness accounts for feature correlations.
  • The same defense can be applied to other deep-learning tasks that rely on many low-level inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The loss-surface view might suggest new regularization methods that explicitly penalize sensitivity along correlated directions.
  • Testing the same training procedure on uncertainties drawn from actual detector simulations would check whether the proxy remains useful outside the adversarial setting.

Load-bearing premise

Adversarial attacks on input features serve as an accurate model for systematic uncertainties affecting jet tagging performance.

What would settle it

A measurement showing that real experimental systematic uncertainties distort jet features differently from the tested adversarial perturbations, and that adversarial training produces no robustness gain under those actual distortions.

Figures

Figures reproduced from arXiv: 2303.14511 by Annika Stein.

Figure 1
Figure 1. Figure 1: Different geometries of loss manifolds for nominal (bottom) and adversarial (top) training. momentum or other properties will not affect the respective network prediction error, or loss, for adversarial training. Nominal training on the other hand is not agnostic to changes in any of the two variables shown. While nominal training offers in general a lower network prediction error, adversarial training off… view at source ↗
Figure 2
Figure 2. Figure 2: Possible directions of adversarial attacks for different models. Starting from kinematic quantities which yield small loss, multiple arrows can be found for an FGSM attack imposed for ad￾versarial training, while only one such arrow is constructed for nom￾inal training. 3. Discussion This observation is a key element to understand why adversarial training may be preferred in settings with potentially disto… view at source ↗
read the original abstract

In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier's vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that adversarial training improves the robustness of jet flavor tagging classifiers in high-energy physics while maintaining high performance on clean inputs. It positions gradient-based adversarial attacks (FGSM, PGD) on low-level jet features as a proxy for systematic uncertainties and uses loss-surface geometry to interpret robustness gains while accounting for input correlations.

Significance. If the central claim holds after addressing the proxy issue, the work would provide a practical defense strategy for ML-based taggers and a geometric lens on robustness that could generalize to other HEP classification tasks. The approach is timely given the increasing use of deep networks on low-level inputs, but its impact depends on demonstrating that robustness gains transfer to physically motivated uncertainties rather than remaining an in silico artifact.

major comments (1)
  1. [§1] §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.
minor comments (1)
  1. [Abstract] The abstract and introduction would benefit from a concise statement of the specific jet-tagging architecture, input features, and performance metrics (e.g., light-jet rejection at fixed b-efficiency) used throughout the study.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the importance of clearly delineating the scope of our claims. The central point concerns the interpretation of adversarial perturbations as a proxy for systematic uncertainties. We address this directly below and have revised the manuscript to strengthen the discussion of limitations while preserving the core technical contributions on loss-surface geometry and adversarial training.

read point-by-point responses
  1. Referee: §1 and abstract: The assertion that adversarial perturbations serve as a model for systematic uncertainties is load-bearing for the entire robustness claim yet is not substantiated. Real jet-tagging systematics (JES/JER variations, pile-up modeling, parton-shower and hadronization uncertainties, b-tagging calibration factors) produce correlated shifts across many inputs that are not guaranteed to align with the gradient directions discovered by PGD/FGSM. Without a quantitative comparison (e.g., overlap of adversarial directions with variation directions from dedicated systematic samples or a direct test of the adversarially trained tagger on those samples), the reported robustness improvement does not demonstrate improved performance under actual uncertainties.

    Authors: We agree that adversarial perturbations generated by FGSM/PGD are not guaranteed to coincide with the directions of real experimental systematics, and we do not claim such equivalence. The manuscript introduces adversarial attacks explicitly as a controlled proxy to probe classifier vulnerability on low-level inputs and to study the geometry of the loss surface under input correlations. The reported gains are demonstrated specifically against these gradient-based attacks; no statement is made that the adversarially trained models automatically improve performance under JES/JER, pile-up, or parton-shower variations. To make this scope explicit we have (i) rephrased the abstract and §1 to describe adversarial training as a defense against gradient-based perturbations rather than a general solution for all systematics, and (ii) added a dedicated limitations paragraph that notes the absence of direct comparison with dedicated systematic samples and suggests such a comparison as future work. These clarifications do not alter the geometric findings or the practical utility of the method within the setting that was actually studied. revision: partial

Circularity Check

0 steps flagged

No circularity detected in empirical adversarial training study

full rationale

The paper describes an empirical machine-learning procedure: adversarial attacks (PGD/FGSM) are applied to jet-tagging classifiers, adversarial training is performed, and loss-surface geometry is inspected. No equations, fitted parameters, or self-citations are shown that would make any reported robustness gain equivalent to the input data or to a prior result by the same authors. The modeling choice of treating adversarial perturbations as a proxy for systematics is an external assumption whose validity can be tested independently; it does not constitute a self-definitional or fitted-input reduction. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; all technical details on models, losses, and data are absent.

pith-pipeline@v0.9.0 · 5672 in / 1028 out tokens · 23816 ms · 2026-05-24T09:45:19.349021+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 5 internal anchors

  1. [1]

    Albertsson K et al. 2018 J. Phys. Conf. Ser. 1085 022008 ( Preprint 1807.02876) URL https://doi.org/ 10.1088/1742-6596/1085/2/022008

  2. [2]

    CMS Collaboration 2018 JINST 13 P05011 ( Preprint 1712.07158 ) URL https://doi.org/10.1088/ 1748-0221/13/05/P05011

  3. [3]

    Bols E, Kieseler J, Verzetti M, Stoye M and Stakia A 2020 JINST 15 P12012 (Preprint 2008.10519) URL https://doi.org/10.1088/1748-0221/15/12/P12012

  4. [4]

    Qu H and Gouskos L 2020 Phys. Rev. D 101 056019 ( Preprint 1902.08570) URL https://doi.org/10. 1103/PhysRevD.101.056019

  5. [5]

    Qu H, Li C and Qian S 2022 Particle Transformer for Jet Tagging ( Preprint 2202.03772)

  6. [6]

    ATLAS Collaboration 2022 ATLAS flavour-tagging algorithms for the LHC Run 2 pp collision dataset (Preprint 2211.16345)

  7. [7]

    Nachman B and Shimmin C 2019 AI Safety for High Energy Physics ( Preprint 1910.08606)

  8. [8]

    ATLAS Collaboration 2020 Monte Carlo to Monte Carlo scale factors for flavour tagging efficiency calibration (Report ATL-PHYS-PUB-2020-009) URL https://cds.cern.ch/record/2718610

  9. [9]

    ATLAS Collaboration 2018 Calibration of light-flavour jet b-tagging rates on ATLAS proton-proton collision data at √s = 13 TeV (Report ATLAS-CONF-2018-006) URL https://cds.cern.ch/record/2314418

  10. [10]

    CMS Collaboration 2022 JINST 17 P03014 ( Preprint 2111.03027 ) URL https://doi.org/10.1088/ 1748-0221/17/03/P03014

  11. [11]

    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R 2014 Intriguing properties of neural networks ( Preprint 1312.6199)

  12. [12]

    Goodfellow I J, Shlens J and Szegedy C 2015 Explaining and Harnessing Adversarial Examples ( Preprint 1412.6572)

  13. [13]

    Stein A, Coubez X, Mondal S, Novak A and Schmidt A 2022 Comput. Softw. Big Sci. 6 15 ( Preprint 2203.13890) URL https://doi.org/10.1007/s41781-022-00087-1

  14. [14]

    CMS Collaboration 2022 Adversarial training for b-tagging algorithms in CMS ( Report CMS DP-2022/049 ) URL https://cds.cern.ch/record/2839919/

  15. [15]

    Ghosh A, Nachman B and Whiteson D 2021 Phys. Rev. D 104 056026 ( Preprint 2105.08742 ) URL https://doi.org/10.1103/PhysRevD.104.056026

  16. [16]

    Butter A, Dillon B M, Plehn T and Vogel L 2022 Performance versus Resilience in Modern Quark-Gluon Tagging (Preprint 2212.10493)

  17. [17]

    Li H, Xu Z, Taylor G, Studer C and Goldstein T 2018 Visualizing the Loss Landscape of Neural Nets NIPS

  18. [18]

    (Preprint 1712.09913)