pith. machine review for the scientific record. sign in

arxiv: 2605.07471 · v1 · submitted 2026-05-08 · 💻 cs.LG · hep-ex

Recognition: 2 theorem links

· Lean Theorem

Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics

Lucie Flek, Matthias Schott

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:03 UTC · model grok-4.3

classification 💻 cs.LG hep-ex
keywords transfer learningfast simulationfull simulationhigh-energy physicsneural networksjet taggingdata efficiencydomain adaptation
0
0 comments X

The pith

Pretrained models on fast simulation outperform baselines on full simulation while needing half the target data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether models trained first on abundant but simplified fast simulation data can transfer useful features to more accurate but expensive full simulation data in high-energy physics. It evaluates this approach on three tasks: distinguishing signal from background events, identifying quark versus gluon jets, and reconstructing missing transverse energy. Using dense networks, graph networks, and transformers, the study finds that pretraining on fast simulation followed by adaptation to full simulation yields better results than training from scratch on the target data alone. The gain appears consistently across tasks and reduces the amount of full simulation data required by roughly a factor of two.

Core claim

Across signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, models pretrained on fast simulation and adapted to full simulation outperform independently trained baselines on the target domain while requiring significantly less target-domain training data, typically by a factor of two. The same benefit holds when adapting between different fast simulation setups. This pattern appears for dense neural networks, graph neural networks, and transformer architectures alike.

What carries the argument

Transfer learning that pretrains neural networks on fast simulation domains and then adapts them to full simulation or alternate fast simulation domains.

If this is right

  • Fewer full simulation events need to be generated to reach a given performance level.
  • Pretrained models can be reused across multiple analysis tasks instead of retraining each time.
  • The same pretraining benefit should appear when moving between any two simulation levels that share similar underlying physics.
  • Published pretrained models on fast simulation could serve as starting points for many different full simulation applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to other physics domains where cheap approximate simulations exist alongside expensive accurate ones.
  • Further gains could come from combining this pretraining with standard domain adaptation methods during the fine-tuning step.
  • Real collision data might serve as an additional target domain once the fast-to-full transfer is established.

Load-bearing premise

Feature representations learned from fast simulation stay aligned enough with full simulation that adaptation succeeds without needing more target data than training from scratch.

What would settle it

Finding that pretrained models require as much or more full simulation data than scratch-trained models on any of the three tasks would show the claimed data reduction does not hold.

Figures

Figures reproduced from arXiv: 2605.07471 by Lucie Flek, Matthias Schott.

Figure 1
Figure 1. Figure 1: Comparison of three representative input features for the Signal/background classification task, namely [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of three representative input features for the gluon jets, namely the transverse momentum [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of three representative input features for the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Network performance quantified by the area under the ROC curve as a function of the training set size. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Network performance quantified by the area under the ROC curve as a function of the training set size. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Left: Network performance quantified by the loss function on a test-set as a function of the training set [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Left: Network performance quantified by the loss function on a test-set as a function of the training set [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Dependence of the AROC performance of the quark–gluon jet tagger (left) and the test-set loss for the [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, using dense neural networks, graph neural networks, and transformer-based architectures. Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two. These results demonstrate that fast simulation can be used to learn robust, reusable representations and motivate publishing trained models as reusable scientific assets beyond large foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies transfer learning across simulation domains in high-energy physics. Models are pretrained on ATLAS-like fast simulation and then adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. The evaluation covers three tasks (signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction) and three architectures (dense NN, GNN, transformer). The central empirical claim is that pretrained models consistently outperform independently trained baselines on held-out target data and reduce the volume of target-domain statistics needed to reach equivalent performance by a factor of approximately two.

Significance. If the reported gains hold under scrutiny, the work demonstrates that fast simulation can produce reusable representations that transfer to more realistic full-simulation domains, thereby lowering the computational cost of training ML models for LHC analyses. The breadth of tasks and architectures provides evidence that the benefit is not task-specific, supporting the suggestion that pretrained models could be published as reusable scientific assets.

major comments (2)
  1. The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.
  2. The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.
minor comments (2)
  1. The abstract refers to 'ATLAS-like' and 'CMS-like' fast simulation without a concise summary of the key differences (e.g., detector response modeling or pile-up treatment); a short paragraph in the introduction would aid readers outside the ATLAS/CMS collaboration.
  2. Learning curves are central to the factor-of-two claim; ensure each panel includes error bands (statistical or bootstrap) and that the x-axis scale (number of target events) is identical across compared curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The two major comments highlight important points for clarity and reproducibility, which we have addressed through targeted revisions.

read point-by-point responses
  1. Referee: The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.

    Authors: We agree that an explicit definition is required for independent verification. In the revised manuscript we now state that equivalence is defined as the target-domain sample size at which the pretrained model reaches 99% of the asymptotic performance (AUC or equivalent metric) attained by the from-scratch baseline on the full target dataset; this threshold is obtained by linear interpolation on the log-scale learning curves. The interpolation procedure, the precise performance metric used for each task, and the bootstrap-based uncertainty on the interpolated point have been added to the abstract, the learning-curve figures, and a dedicated paragraph in Section 4.2. revision: yes

  2. Referee: The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.

    Authors: We confirm that the hyper-parameter search budget and ranges were identical for the from-scratch baselines and for the fine-tuning stage of the pretrained models. Pretraining on the source domain used a separate but comparably sized search; this does not confer an unfair advantage on the target-domain comparison because the baselines receive the same optimization effort on the target data. We have added an explicit statement of this protocol, including the search space and number of trials, to Section 3.2 (Experimental Setup) and to the captions of Tables 2–4. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical transfer learning study

full rationale

The paper reports purely empirical results from pretraining ML models (dense NN, GNN, transformer) on ATLAS-like fast simulation and adapting them to CMS-like fast simulation and fully simulated ATLAS Open Data across three tasks. Performance gains and data-efficiency claims (factor-of-two reduction) are measured via direct comparisons of learning curves on held-out target-domain data. No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs by construction. The work contains no self-citation load-bearing steps, uniqueness theorems, or ansatzes; all claims rest on reproducible experimental protocol and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard machine-learning assumption that fast simulation shares enough low-level features with full simulation for transfer learning to succeed after fine-tuning; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Representations learned on fast simulation remain useful after adaptation to full simulation despite differences in realism.
    Implicit premise enabling the transfer learning setup described in the abstract.

pith-pipeline@v0.9.0 · 5444 in / 1219 out tokens · 39906 ms · 2026-05-11T02:03:05.000570+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two.

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_induction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The network architecture is intentionally kept simple and is composed of a sequence of linear layers with rectified linear unit (ReLU) activations and dropout regularization.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Machine Learning in High Energy Physics Community White Paper.J

    Kim Albertsson et al. Machine Learning in High Energy Physics Community White Paper.J. Phys. Conf. Ser., 1085(2):022008, 2018

  2. [2]

    Machine learning at the energy and intensity frontiers of particle physics.Nature, 560(7716):41–48, 2018

    Alexander Radovic, Mike Williams, David Rousseau, Michael Kagan, Daniele Bonacorsi, Alexander Himmel, Adam Aurisano, Kazuhiro Terao, and Taritree Wongjirad. Machine learning at the energy and intensity frontiers of particle physics.Nature, 560(7716):41–48, 2018

  3. [3]

    Deep Learning and its Application to LHC Physics.Ann

    Dan Guest, Kyle Cranmer, and Daniel Whiteson. Deep Learning and its Application to LHC Physics.Ann. Rev. Nucl. Part. Sci., 68:161–181, 2018

  4. [4]

    Modern Machine Learning for LHC Physicists

    Tilman Plehn, Anja Butter, Barry Dillon, Theo Heimel, Claudius Krause, and Ramon Winterhalder. Modern Machine Learning for LHC Physicists. 11 2022

  5. [5]

    A Living Review of Machine Learning for Particle Physics

    Matthew Feickert and Benjamin Nachman. A Living Review of Machine Learning for Particle Physics. 2 2021

  6. [6]

    Agostinelli et al

    S. Agostinelli et al. GEANT4 - A Simulation Toolkit.Nucl. Instrum. Meth. A, 506:250–303, 2003

  7. [7]

    de Favereau, C

    J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi. DELPHES 3, A modular framework for fast simulation of a generic collider experiment.JHEP, 02:057, 2014

  8. [8]

    CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024

    Ebru Simsek, Bora Isildak, Anil Dogru, Reyhan Aydogan, Aydogan Burak Bayrak, and Seyda Ertekin. CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024

  9. [9]

    Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024

    Samuel Bein, Patrick Connor, Kevin Pedro, Peter Schleper, and Moritz Wolf. Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024

  10. [10]

    Jet Substructure Studies with CMS Open Data.Phys

    Aashish Tripathee, Wei Xue, Andrew Larkoski, Simone Marzani, and Jesse Thaler. Jet Substructure Studies with CMS Open Data.Phys. Rev. D, 96(7):074003, 2017

  11. [11]

    Sim-to-Real Domain Adaptation For High Energy Physics

    Marouen Baalouch, Maxime Defurne, Jean-Philippe Poli, and Noëlie Cherrier. Sim-to-Real Domain Adaptation For High Energy Physics. In33rd Annual Conference on Neural Information Processing Systems, 12 2019

  12. [12]

    Particle Transformer for Jet Tagging

    Huilin Qu, Congqiao Li, and Sitian Qian. Particle Transformer for Jet Tagging. 2 2022

  13. [13]

    Foundation model framework for all tasks involving jet physics.Phys

    Wahid Bhimji, Chris Harris, Vinicius Mikuni, and Benjamin Nachman. Foundation model framework for all tasks involving jet physics.Phys. Rev. D, 113(3):032020, 2026

  14. [14]

    Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys

    Chase Shimmin, Peter Sadowski, Pierre Baldi, Edison Weik, Daniel Whiteson, Edward Goul, and Andreas Søgaard. Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys. Rev. D, 96(7):074034, 2017

  15. [15]

    The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019

    Anja Butter et al. The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019. 16

  16. [16]

    Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders.Phys

    Farouk Mokhtar, Joosep Pata, Dolores Garcia, Eric Wulff, Mengke Zhang, Michael Kagan, and Javier Duarte. Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders.Phys. Rev. D, 111(9):092015, 2025

  17. [17]

    Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022

    Tomoe Kishimoto, Masahiro Morinaga, Masahiko Saito, and Junichi Tanaka. Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022

  18. [18]

    Battaglia et al

    Peter W. Battaglia et al. Relational inductive biases, deep learning, and graph networks. 6 2018

  19. [19]

    ParticleNet: Jet Tagging via Particle Clouds.Phys

    Huilin Qu and Loukas Gouskos. ParticleNet: Jet Tagging via Particle Clouds.Phys. Rev. D, 101(5):056019, 2020

  20. [20]

    Point cloud transformers applied to collider physics.Mach

    Vinicius Mikuni and Florencia Canelli. Point cloud transformers applied to collider physics.Mach. Learn. Sci. Tech., 2(3):035027, 2021

  21. [21]

    The ATLAS Experiment at the CERN Large Hadron Collider.JINST, 3:S08003, 2008

  22. [22]

    Chatrchyan et al

    S. Chatrchyan et al. The CMS Experiment at the CERN LHC.JINST, 3:S08004, 2008

  23. [23]

    CERN Open Data Portal

    ATLAS DAOD-PHYSLITE format MC simulation electroweak boson nominal samples. CERN Open Data Portal. 2020

  24. [24]

    CERN Open Data Portal

    ATLAS DAOD-PHYSLITE format MC simulation top nominal samples. CERN Open Data Portal. 2020

  25. [25]

    The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025

    Mariana Vivas Albornoz. The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025

  26. [26]

    Introduction to the usage of open data from the Large Hadron Collider for computer scientists in the context of machine learning.SciPost Phys

    Timo Saala and Matthias Schott. Introduction to the usage of open data from the Large Hadron Collider for computer scientists in the context of machine learning.SciPost Phys. Lect. Notes, 96:1, 2025

  27. [27]

    Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands. A Brief Introduction to PYTHIA 8.1.Comput. Phys. Commun., 178:852–867, 2008

  28. [28]

    Measurement of off-shell Higgs boson production in theH ∗→ZZ→4ℓdecay channel using a neural simulation-based inference technique in 13TeV pp collisions with the ATLAS detector.Rept

    Georges Aad et al. Measurement of off-shell Higgs boson production in theH ∗→ZZ→4ℓdecay channel using a neural simulation-based inference technique in 13TeV pp collisions with the ATLAS detector.Rept. Prog. Phys., 88(5):057803, 2025

  29. [29]

    Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV

    Vladimir Chekhovsky et al. Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV. 4 2025

  30. [30]

    Observation of a pseudoscalar excess at the top quark pair production threshold

    Aram Hayrapetyan et al. Observation of a pseudoscalar excess at the top quark pair production threshold. Rept. Prog. Phys., 88(8):087801, 2025

  31. [31]

    Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector.JHEP, 02:084, 2025

    Georges Aad et al. Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector.JHEP, 02:084, 2025

  32. [32]

    End-to-end jet classification of boosted top quarks with the CMS open data.EPJ Web Conf., 251:04030, 2021

    Michael Andrews et al. End-to-end jet classification of boosted top quarks with the CMS open data.EPJ Web Conf., 251:04030, 2021

  33. [33]

    Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s= 13 TeV.Eur

    Aram Hayrapetyan et al. Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s= 13 TeV.Eur. Phys. J. C, 85(3):342, 2025

  34. [34]

    Search for short- and long-lived axion-like particles inH→aa→4γdecays with the ATLAS experiment at the LHC.Eur

    Georges Aad et al. Search for short- and long-lived axion-like particles inH→aa→4γdecays with the ATLAS experiment at the LHC.Eur. Phys. J. C, 84(7):742, 2024

  35. [35]

    Quark-gluon Jet Discrimination At CMS

    Tom Cornelis. Quark-gluon Jet Discrimination At CMS. In2nd Large Hadron Collider Physics Conference, 9 2014

  36. [36]

    Andrews, J

    M. Andrews, J. Alison, S. An, Patrick Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai. End-to-end jet classification of quarks and gluons with the CMS Open Data.Nucl. Instrum. Meth. A, 977:164304, 2020

  37. [37]

    DeepMET: Improving missing transverse momentum estimation with a deep neural network

    Aram Hayrapetyan et al. DeepMET: Improving missing transverse momentum estimation with a deep neural network. 9 2025

  38. [38]

    The performance of missing transverse momentum reconstruction and its significance with the ATLAS detector using 140 fb−1 of√s= 13TeV pp collisions.Eur

    Georges Aad et al. The performance of missing transverse momentum reconstruction and its significance with the ATLAS detector using 140 fb−1 of√s= 13TeV pp collisions.Eur. Phys. J. C, 85(6):606, 2025

  39. [39]

    Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott

    Benedikt Maier, Siddharth M. Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott. Pile-up mitigation using attention.Mach. Learn. Sci. Tech., 3(2):025012, 2022

  40. [40]

    Neural Scaling Laws for Boosted Jet Tagging

    Matthias Vigl, Nicole Hartman, Michael Kagan, and Lukas Heinrich. Neural Scaling Laws for Boosted Jet Tagging. 2 2026