arxiv: 2605.07471 · v1 · submitted 2026-05-08 · 💻 cs.LG · hep-ex

Recognition: 2 theorem links

· Lean Theorem

Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics

Lucie Flek, Matthias Schott

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:03 UTC · model grok-4.3

classification 💻 cs.LG hep-ex

keywords transfer learningfast simulationfull simulationhigh-energy physicsneural networksjet taggingdata efficiencydomain adaptation

0 comments

The pith

Pretrained models on fast simulation outperform baselines on full simulation while needing half the target data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether models trained first on abundant but simplified fast simulation data can transfer useful features to more accurate but expensive full simulation data in high-energy physics. It evaluates this approach on three tasks: distinguishing signal from background events, identifying quark versus gluon jets, and reconstructing missing transverse energy. Using dense networks, graph networks, and transformers, the study finds that pretraining on fast simulation followed by adaptation to full simulation yields better results than training from scratch on the target data alone. The gain appears consistently across tasks and reduces the amount of full simulation data required by roughly a factor of two.

Core claim

Across signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, models pretrained on fast simulation and adapted to full simulation outperform independently trained baselines on the target domain while requiring significantly less target-domain training data, typically by a factor of two. The same benefit holds when adapting between different fast simulation setups. This pattern appears for dense neural networks, graph neural networks, and transformer architectures alike.

What carries the argument

Transfer learning that pretrains neural networks on fast simulation domains and then adapts them to full simulation or alternate fast simulation domains.

If this is right

Fewer full simulation events need to be generated to reach a given performance level.
Pretrained models can be reused across multiple analysis tasks instead of retraining each time.
The same pretraining benefit should appear when moving between any two simulation levels that share similar underlying physics.
Published pretrained models on fast simulation could serve as starting points for many different full simulation applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other physics domains where cheap approximate simulations exist alongside expensive accurate ones.
Further gains could come from combining this pretraining with standard domain adaptation methods during the fine-tuning step.
Real collision data might serve as an additional target domain once the fast-to-full transfer is established.

Load-bearing premise

Feature representations learned from fast simulation stay aligned enough with full simulation that adaptation succeeds without needing more target data than training from scratch.

What would settle it

Finding that pretrained models require as much or more full simulation data than scratch-trained models on any of the three tasks would show the claimed data reduction does not hold.

Figures

Figures reproduced from arXiv: 2605.07471 by Lucie Flek, Matthias Schott.

**Figure 1.** Figure 1: Comparison of three representative input features for the Signal/background classification task, namely [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of three representative input features for the gluon jets, namely the transverse momentum [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of three representative input features for the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Network performance quantified by the area under the ROC curve as a function of the training set size. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Network performance quantified by the area under the ROC curve as a function of the training set size. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Left: Network performance quantified by the loss function on a test-set as a function of the training set [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Left: Network performance quantified by the loss function on a test-set as a function of the training set [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Dependence of the AROC performance of the quark–gluon jet tagger (left) and the test-set loss for the [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, using dense neural networks, graph neural networks, and transformer-based architectures. Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two. These results demonstrate that fast simulation can be used to learn robust, reusable representations and motivate publishing trained models as reusable scientific assets beyond large foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pretraining on fast simulation cuts target data needs by about half for full simulation tasks across three LHC-relevant problems and three architectures.

read the letter

The main takeaway is that pretraining on fast simulation data lets models reach target performance on full simulation with roughly half the target statistics, and this holds across the tasks and networks they checked. They pretrain dense networks, GNNs, and transformers on ATLAS-like fast sim, then adapt to CMS-like fast sim and to fully simulated ATLAS Open Data. The tasks are signal-background classification, quark-gluon tagging, and missing transverse energy reconstruction. Learning curves compare the pretrained models directly to from-scratch baselines on held-out target data, showing the data reduction in concrete terms. This is new in providing a multi-task, multi-architecture quantification in settings that match real LHC workflows rather than single isolated demos. The evidence is empirical and non-circular: gains are measured on independent target samples, and the feature alignment assumption is tested by the actual performance lift. The paper does well at keeping the protocol straightforward and reporting consistent patterns instead of cherry-picked wins. Soft spots are minor. The factor-of-two reduction is a typical value that varies somewhat by task and architecture, and the work sticks to basic from-scratch baselines without testing stronger domain-adaptation methods. Details on hyperparameter sweeps or formal statistical tests on the curves are not highlighted, though the overall trend appears stable. This is for HEP physicists and ML practitioners who run analyses limited by full simulation cost. Readers facing similar data bottlenecks will get practical numbers they can try. It deserves a serious referee because the experiments are grounded, the result addresses a recurring constraint, and nothing in the setup looks broken.

Referee Report

2 major / 2 minor

Summary. The manuscript studies transfer learning across simulation domains in high-energy physics. Models are pretrained on ATLAS-like fast simulation and then adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. The evaluation covers three tasks (signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction) and three architectures (dense NN, GNN, transformer). The central empirical claim is that pretrained models consistently outperform independently trained baselines on held-out target data and reduce the volume of target-domain statistics needed to reach equivalent performance by a factor of approximately two.

Significance. If the reported gains hold under scrutiny, the work demonstrates that fast simulation can produce reusable representations that transfer to more realistic full-simulation domains, thereby lowering the computational cost of training ML models for LHC analyses. The breadth of tasks and architectures provides evidence that the benefit is not task-specific, supporting the suggestion that pretrained models could be published as reusable scientific assets.

major comments (2)

The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.
The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.

minor comments (2)

The abstract refers to 'ATLAS-like' and 'CMS-like' fast simulation without a concise summary of the key differences (e.g., detector response modeling or pile-up treatment); a short paragraph in the introduction would aid readers outside the ATLAS/CMS collaboration.
Learning curves are central to the factor-of-two claim; ensure each panel includes error bands (statistical or bootstrap) and that the x-axis scale (number of target events) is identical across compared curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The two major comments highlight important points for clarity and reproducibility, which we have addressed through targeted revisions.

read point-by-point responses

Referee: The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.

Authors: We agree that an explicit definition is required for independent verification. In the revised manuscript we now state that equivalence is defined as the target-domain sample size at which the pretrained model reaches 99% of the asymptotic performance (AUC or equivalent metric) attained by the from-scratch baseline on the full target dataset; this threshold is obtained by linear interpolation on the log-scale learning curves. The interpolation procedure, the precise performance metric used for each task, and the bootstrap-based uncertainty on the interpolated point have been added to the abstract, the learning-curve figures, and a dedicated paragraph in Section 4.2. revision: yes
Referee: The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.

Authors: We confirm that the hyper-parameter search budget and ranges were identical for the from-scratch baselines and for the fine-tuning stage of the pretrained models. Pretraining on the source domain used a separate but comparably sized search; this does not confer an unfair advantage on the target-domain comparison because the baselines receive the same optimization effort on the target data. We have added an explicit statement of this protocol, including the search space and number of trials, to Section 3.2 (Experimental Setup) and to the captions of Tables 2–4. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical transfer learning study

full rationale

The paper reports purely empirical results from pretraining ML models (dense NN, GNN, transformer) on ATLAS-like fast simulation and adapting them to CMS-like fast simulation and fully simulated ATLAS Open Data across three tasks. Performance gains and data-efficiency claims (factor-of-two reduction) are measured via direct comparisons of learning curves on held-out target-domain data. No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs by construction. The work contains no self-citation load-bearing steps, uniqueness theorems, or ansatzes; all claims rest on reproducible experimental protocol and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard machine-learning assumption that fast simulation shares enough low-level features with full simulation for transfer learning to succeed after fine-tuning; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Representations learned on fast simulation remain useful after adaptation to full simulation despite differences in realism.
Implicit premise enabling the transfer learning setup described in the abstract.

pith-pipeline@v0.9.0 · 5444 in / 1219 out tokens · 39906 ms · 2026-05-11T02:03:05.000570+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The network architecture is intentionally kept simple and is composed of a sequence of linear layers with rectified linear unit (ReLU) activations and dropout regularization.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Machine Learning in High Energy Physics Community White Paper.J

Kim Albertsson et al. Machine Learning in High Energy Physics Community White Paper.J. Phys. Conf. Ser., 1085(2):022008, 2018

work page 2018
[2]

Machine learning at the energy and intensity frontiers of particle physics.Nature, 560(7716):41–48, 2018

Alexander Radovic, Mike Williams, David Rousseau, Michael Kagan, Daniele Bonacorsi, Alexander Himmel, Adam Aurisano, Kazuhiro Terao, and Taritree Wongjirad. Machine learning at the energy and intensity frontiers of particle physics.Nature, 560(7716):41–48, 2018

work page 2018
[3]

Deep Learning and its Application to LHC Physics.Ann

Dan Guest, Kyle Cranmer, and Daniel Whiteson. Deep Learning and its Application to LHC Physics.Ann. Rev. Nucl. Part. Sci., 68:161–181, 2018

work page 2018
[4]

Modern Machine Learning for LHC Physicists

Tilman Plehn, Anja Butter, Barry Dillon, Theo Heimel, Claudius Krause, and Ramon Winterhalder. Modern Machine Learning for LHC Physicists. 11 2022

work page 2022
[5]

A Living Review of Machine Learning for Particle Physics

Matthew Feickert and Benjamin Nachman. A Living Review of Machine Learning for Particle Physics. 2 2021

work page 2021
[6]

Agostinelli et al

S. Agostinelli et al. GEANT4 - A Simulation Toolkit.Nucl. Instrum. Meth. A, 506:250–303, 2003

work page 2003
[7]

de Favereau, C

J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi. DELPHES 3, A modular framework for fast simulation of a generic collider experiment.JHEP, 02:057, 2014

work page 2014
[8]

CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024

Ebru Simsek, Bora Isildak, Anil Dogru, Reyhan Aydogan, Aydogan Burak Bayrak, and Seyda Ertekin. CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024

work page 2024
[9]

Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024

Samuel Bein, Patrick Connor, Kevin Pedro, Peter Schleper, and Moritz Wolf. Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024

work page 2024
[10]

Jet Substructure Studies with CMS Open Data.Phys

Aashish Tripathee, Wei Xue, Andrew Larkoski, Simone Marzani, and Jesse Thaler. Jet Substructure Studies with CMS Open Data.Phys. Rev. D, 96(7):074003, 2017

work page 2017
[11]

Sim-to-Real Domain Adaptation For High Energy Physics

Marouen Baalouch, Maxime Defurne, Jean-Philippe Poli, and Noëlie Cherrier. Sim-to-Real Domain Adaptation For High Energy Physics. In33rd Annual Conference on Neural Information Processing Systems, 12 2019

work page 2019
[12]

Particle Transformer for Jet Tagging

Huilin Qu, Congqiao Li, and Sitian Qian. Particle Transformer for Jet Tagging. 2 2022

work page 2022
[13]

Foundation model framework for all tasks involving jet physics.Phys

Wahid Bhimji, Chris Harris, Vinicius Mikuni, and Benjamin Nachman. Foundation model framework for all tasks involving jet physics.Phys. Rev. D, 113(3):032020, 2026

work page 2026
[14]

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys

Chase Shimmin, Peter Sadowski, Pierre Baldi, Edison Weik, Daniel Whiteson, Edward Goul, and Andreas Søgaard. Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys. Rev. D, 96(7):074034, 2017

work page 2017
[15]

The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019

Anja Butter et al. The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019. 16

work page 2019
[16]

Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders.Phys

Farouk Mokhtar, Joosep Pata, Dolores Garcia, Eric Wulff, Mengke Zhang, Michael Kagan, and Javier Duarte. Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders.Phys. Rev. D, 111(9):092015, 2025

work page 2025
[17]

Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022

Tomoe Kishimoto, Masahiro Morinaga, Masahiko Saito, and Junichi Tanaka. Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022

work page 2022
[18]

Battaglia et al

Peter W. Battaglia et al. Relational inductive biases, deep learning, and graph networks. 6 2018

work page 2018
[19]

ParticleNet: Jet Tagging via Particle Clouds.Phys

Huilin Qu and Loukas Gouskos. ParticleNet: Jet Tagging via Particle Clouds.Phys. Rev. D, 101(5):056019, 2020

work page 2020
[20]

Point cloud transformers applied to collider physics.Mach

Vinicius Mikuni and Florencia Canelli. Point cloud transformers applied to collider physics.Mach. Learn. Sci. Tech., 2(3):035027, 2021

work page 2021
[21]

The ATLAS Experiment at the CERN Large Hadron Collider.JINST, 3:S08003, 2008

work page 2008
[22]

Chatrchyan et al

S. Chatrchyan et al. The CMS Experiment at the CERN LHC.JINST, 3:S08004, 2008

work page 2008
[23]

CERN Open Data Portal

ATLAS DAOD-PHYSLITE format MC simulation electroweak boson nominal samples. CERN Open Data Portal. 2020

work page 2020
[24]

CERN Open Data Portal

ATLAS DAOD-PHYSLITE format MC simulation top nominal samples. CERN Open Data Portal. 2020

work page 2020
[25]

The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025

Mariana Vivas Albornoz. The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025

work page 2025
[26]

Introduction to the usage of open data from the Large Hadron Collider for computer scientists in the context of machine learning.SciPost Phys

Timo Saala and Matthias Schott. Introduction to the usage of open data from the Large Hadron Collider for computer scientists in the context of machine learning.SciPost Phys. Lect. Notes, 96:1, 2025

work page 2025
[27]

Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands. A Brief Introduction to PYTHIA 8.1.Comput. Phys. Commun., 178:852–867, 2008

work page 2008
[28]

Measurement of off-shell Higgs boson production in theH ∗→ZZ→4ℓdecay channel using a neural simulation-based inference technique in 13TeV pp collisions with the ATLAS detector.Rept

Georges Aad et al. Measurement of off-shell Higgs boson production in theH ∗→ZZ→4ℓdecay channel using a neural simulation-based inference technique in 13TeV pp collisions with the ATLAS detector.Rept. Prog. Phys., 88(5):057803, 2025

work page 2025
[29]

Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV

Vladimir Chekhovsky et al. Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV. 4 2025

work page 2025
[30]

Observation of a pseudoscalar excess at the top quark pair production threshold

Aram Hayrapetyan et al. Observation of a pseudoscalar excess at the top quark pair production threshold. Rept. Prog. Phys., 88(8):087801, 2025

work page 2025
[31]

Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector.JHEP, 02:084, 2025

Georges Aad et al. Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector.JHEP, 02:084, 2025

work page 2025
[32]

End-to-end jet classification of boosted top quarks with the CMS open data.EPJ Web Conf., 251:04030, 2021

Michael Andrews et al. End-to-end jet classification of boosted top quarks with the CMS open data.EPJ Web Conf., 251:04030, 2021

work page 2021
[33]

Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s= 13 TeV.Eur

Aram Hayrapetyan et al. Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s= 13 TeV.Eur. Phys. J. C, 85(3):342, 2025

work page 2025
[34]

Search for short- and long-lived axion-like particles inH→aa→4γdecays with the ATLAS experiment at the LHC.Eur

Georges Aad et al. Search for short- and long-lived axion-like particles inH→aa→4γdecays with the ATLAS experiment at the LHC.Eur. Phys. J. C, 84(7):742, 2024

work page 2024
[35]

Quark-gluon Jet Discrimination At CMS

Tom Cornelis. Quark-gluon Jet Discrimination At CMS. In2nd Large Hadron Collider Physics Conference, 9 2014

work page 2014
[36]

Andrews, J

M. Andrews, J. Alison, S. An, Patrick Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai. End-to-end jet classification of quarks and gluons with the CMS Open Data.Nucl. Instrum. Meth. A, 977:164304, 2020

work page 2020
[37]

DeepMET: Improving missing transverse momentum estimation with a deep neural network

Aram Hayrapetyan et al. DeepMET: Improving missing transverse momentum estimation with a deep neural network. 9 2025

work page 2025
[38]

The performance of missing transverse momentum reconstruction and its significance with the ATLAS detector using 140 fb−1 of√s= 13TeV pp collisions.Eur

Georges Aad et al. The performance of missing transverse momentum reconstruction and its significance with the ATLAS detector using 140 fb−1 of√s= 13TeV pp collisions.Eur. Phys. J. C, 85(6):606, 2025

work page 2025
[39]

Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott

Benedikt Maier, Siddharth M. Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott. Pile-up mitigation using attention.Mach. Learn. Sci. Tech., 3(2):025012, 2022

work page 2022
[40]

Neural Scaling Laws for Boosted Jet Tagging

Matthias Vigl, Nicole Hartman, Michael Kagan, and Lukas Heinrich. Neural Scaling Laws for Boosted Jet Tagging. 2 2026

work page 2026