Recognition: 2 theorem links
· Lean TheoremTransfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics
Pith reviewed 2026-05-11 02:03 UTC · model grok-4.3
The pith
Pretrained models on fast simulation outperform baselines on full simulation while needing half the target data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, models pretrained on fast simulation and adapted to full simulation outperform independently trained baselines on the target domain while requiring significantly less target-domain training data, typically by a factor of two. The same benefit holds when adapting between different fast simulation setups. This pattern appears for dense neural networks, graph neural networks, and transformer architectures alike.
What carries the argument
Transfer learning that pretrains neural networks on fast simulation domains and then adapts them to full simulation or alternate fast simulation domains.
If this is right
- Fewer full simulation events need to be generated to reach a given performance level.
- Pretrained models can be reused across multiple analysis tasks instead of retraining each time.
- The same pretraining benefit should appear when moving between any two simulation levels that share similar underlying physics.
- Published pretrained models on fast simulation could serve as starting points for many different full simulation applications.
Where Pith is reading between the lines
- The approach may generalize to other physics domains where cheap approximate simulations exist alongside expensive accurate ones.
- Further gains could come from combining this pretraining with standard domain adaptation methods during the fine-tuning step.
- Real collision data might serve as an additional target domain once the fast-to-full transfer is established.
Load-bearing premise
Feature representations learned from fast simulation stay aligned enough with full simulation that adaptation succeeds without needing more target data than training from scratch.
What would settle it
Finding that pretrained models require as much or more full simulation data than scratch-trained models on any of the three tasks would show the claimed data reduction does not hold.
Figures
read the original abstract
Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, using dense neural networks, graph neural networks, and transformer-based architectures. Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two. These results demonstrate that fast simulation can be used to learn robust, reusable representations and motivate publishing trained models as reusable scientific assets beyond large foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies transfer learning across simulation domains in high-energy physics. Models are pretrained on ATLAS-like fast simulation and then adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. The evaluation covers three tasks (signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction) and three architectures (dense NN, GNN, transformer). The central empirical claim is that pretrained models consistently outperform independently trained baselines on held-out target data and reduce the volume of target-domain statistics needed to reach equivalent performance by a factor of approximately two.
Significance. If the reported gains hold under scrutiny, the work demonstrates that fast simulation can produce reusable representations that transfer to more realistic full-simulation domains, thereby lowering the computational cost of training ML models for LHC analyses. The breadth of tasks and architectures provides evidence that the benefit is not task-specific, supporting the suggestion that pretrained models could be published as reusable scientific assets.
major comments (2)
- The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.
- The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.
minor comments (2)
- The abstract refers to 'ATLAS-like' and 'CMS-like' fast simulation without a concise summary of the key differences (e.g., detector response modeling or pile-up treatment); a short paragraph in the introduction would aid readers outside the ATLAS/CMS collaboration.
- Learning curves are central to the factor-of-two claim; ensure each panel includes error bands (statistical or bootstrap) and that the x-axis scale (number of target events) is identical across compared curves.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. The two major comments highlight important points for clarity and reproducibility, which we have addressed through targeted revisions.
read point-by-point responses
-
Referee: The factor-of-two reduction in required target statistics is the most load-bearing quantitative claim. The abstract and learning-curve results state that this factor is obtained by comparing data volumes needed for equivalent performance, yet the precise definition of equivalence (e.g., a fixed AUC threshold, a relative performance delta, or interpolation method) is not stated; without it the numerical factor cannot be independently verified or reproduced.
Authors: We agree that an explicit definition is required for independent verification. In the revised manuscript we now state that equivalence is defined as the target-domain sample size at which the pretrained model reaches 99% of the asymptotic performance (AUC or equivalent metric) attained by the from-scratch baseline on the full target dataset; this threshold is obtained by linear interpolation on the log-scale learning curves. The interpolation procedure, the precise performance metric used for each task, and the bootstrap-based uncertainty on the interpolated point have been added to the abstract, the learning-curve figures, and a dedicated paragraph in Section 4.2. revision: yes
-
Referee: The experimental protocol compares pretrained models against from-scratch baselines. It is unclear whether hyper-parameter search budgets and ranges were identical for both; if the pretrained models received additional tuning on the source domain while baselines did not, the reported outperformance could be partly attributable to unequal optimization rather than transfer alone.
Authors: We confirm that the hyper-parameter search budget and ranges were identical for the from-scratch baselines and for the fine-tuning stage of the pretrained models. Pretraining on the source domain used a separate but comparably sized search; this does not confer an unfair advantage on the target-domain comparison because the baselines receive the same optimization effort on the target data. We have added an explicit statement of this protocol, including the search space and number of trials, to Section 3.2 (Experimental Setup) and to the captions of Tables 2–4. revision: yes
Circularity Check
No significant circularity in empirical transfer learning study
full rationale
The paper reports purely empirical results from pretraining ML models (dense NN, GNN, transformer) on ATLAS-like fast simulation and adapting them to CMS-like fast simulation and fully simulated ATLAS Open Data across three tasks. Performance gains and data-efficiency claims (factor-of-two reduction) are measured via direct comparisons of learning curves on held-out target-domain data. No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs by construction. The work contains no self-citation load-bearing steps, uniqueness theorems, or ansatzes; all claims rest on reproducible experimental protocol and external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Representations learned on fast simulation remain useful after adaptation to full simulation despite differences in realism.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Models are pretrained on ATLAS-like fast simulation and adapted to CMS-like fast simulation and to fully simulated ATLAS Open Data. Across all tasks, pretrained models consistently outperform independently trained baselines and require significantly less target-domain training data, typically reducing the needed statistics by about a factor of two.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat_induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The network architecture is intentionally kept simple and is composed of a sequence of linear layers with rectified linear unit (ReLU) activations and dropout regularization.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Machine Learning in High Energy Physics Community White Paper.J
Kim Albertsson et al. Machine Learning in High Energy Physics Community White Paper.J. Phys. Conf. Ser., 1085(2):022008, 2018
work page 2018
-
[2]
Alexander Radovic, Mike Williams, David Rousseau, Michael Kagan, Daniele Bonacorsi, Alexander Himmel, Adam Aurisano, Kazuhiro Terao, and Taritree Wongjirad. Machine learning at the energy and intensity frontiers of particle physics.Nature, 560(7716):41–48, 2018
work page 2018
-
[3]
Deep Learning and its Application to LHC Physics.Ann
Dan Guest, Kyle Cranmer, and Daniel Whiteson. Deep Learning and its Application to LHC Physics.Ann. Rev. Nucl. Part. Sci., 68:161–181, 2018
work page 2018
-
[4]
Modern Machine Learning for LHC Physicists
Tilman Plehn, Anja Butter, Barry Dillon, Theo Heimel, Claudius Krause, and Ramon Winterhalder. Modern Machine Learning for LHC Physicists. 11 2022
work page 2022
-
[5]
A Living Review of Machine Learning for Particle Physics
Matthew Feickert and Benjamin Nachman. A Living Review of Machine Learning for Particle Physics. 2 2021
work page 2021
-
[6]
S. Agostinelli et al. GEANT4 - A Simulation Toolkit.Nucl. Instrum. Meth. A, 506:250–303, 2003
work page 2003
-
[7]
J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi. DELPHES 3, A modular framework for fast simulation of a generic collider experiment.JHEP, 02:057, 2014
work page 2014
-
[8]
CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024
Ebru Simsek, Bora Isildak, Anil Dogru, Reyhan Aydogan, Aydogan Burak Bayrak, and Seyda Ertekin. CALPAGAN: Calorimetry for Particles Using Generative Adversarial Networks.PTEP, 2024(8):083C01, 2024
work page 2024
-
[9]
Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024
Samuel Bein, Patrick Connor, Kevin Pedro, Peter Schleper, and Moritz Wolf. Refining fast simulation using machine learning.EPJ Web Conf., 295:09032, 2024
work page 2024
-
[10]
Jet Substructure Studies with CMS Open Data.Phys
Aashish Tripathee, Wei Xue, Andrew Larkoski, Simone Marzani, and Jesse Thaler. Jet Substructure Studies with CMS Open Data.Phys. Rev. D, 96(7):074003, 2017
work page 2017
-
[11]
Sim-to-Real Domain Adaptation For High Energy Physics
Marouen Baalouch, Maxime Defurne, Jean-Philippe Poli, and Noëlie Cherrier. Sim-to-Real Domain Adaptation For High Energy Physics. In33rd Annual Conference on Neural Information Processing Systems, 12 2019
work page 2019
-
[12]
Particle Transformer for Jet Tagging
Huilin Qu, Congqiao Li, and Sitian Qian. Particle Transformer for Jet Tagging. 2 2022
work page 2022
-
[13]
Foundation model framework for all tasks involving jet physics.Phys
Wahid Bhimji, Chris Harris, Vinicius Mikuni, and Benjamin Nachman. Foundation model framework for all tasks involving jet physics.Phys. Rev. D, 113(3):032020, 2026
work page 2026
-
[14]
Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys
Chase Shimmin, Peter Sadowski, Pierre Baldi, Edison Weik, Daniel Whiteson, Edward Goul, and Andreas Søgaard. Decorrelated Jet Substructure Tagging using Adversarial Neural Networks.Phys. Rev. D, 96(7):074034, 2017
work page 2017
-
[15]
The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019
Anja Butter et al. The Machine Learning landscape of top taggers.SciPost Phys., 7:014, 2019. 16
work page 2019
-
[16]
Farouk Mokhtar, Joosep Pata, Dolores Garcia, Eric Wulff, Mengke Zhang, Michael Kagan, and Javier Duarte. Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders.Phys. Rev. D, 111(9):092015, 2025
work page 2025
-
[17]
Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022
Tomoe Kishimoto, Masahiro Morinaga, Masahiko Saito, and Junichi Tanaka. Application of transfer learning to event classification in collider physics.PoS, ISGC2022:016, 2022
work page 2022
-
[18]
Peter W. Battaglia et al. Relational inductive biases, deep learning, and graph networks. 6 2018
work page 2018
-
[19]
ParticleNet: Jet Tagging via Particle Clouds.Phys
Huilin Qu and Loukas Gouskos. ParticleNet: Jet Tagging via Particle Clouds.Phys. Rev. D, 101(5):056019, 2020
work page 2020
-
[20]
Point cloud transformers applied to collider physics.Mach
Vinicius Mikuni and Florencia Canelli. Point cloud transformers applied to collider physics.Mach. Learn. Sci. Tech., 2(3):035027, 2021
work page 2021
-
[21]
The ATLAS Experiment at the CERN Large Hadron Collider.JINST, 3:S08003, 2008
work page 2008
-
[22]
S. Chatrchyan et al. The CMS Experiment at the CERN LHC.JINST, 3:S08004, 2008
work page 2008
-
[23]
ATLAS DAOD-PHYSLITE format MC simulation electroweak boson nominal samples. CERN Open Data Portal. 2020
work page 2020
-
[24]
ATLAS DAOD-PHYSLITE format MC simulation top nominal samples. CERN Open Data Portal. 2020
work page 2020
-
[25]
The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025
Mariana Vivas Albornoz. The First Release of ATLAS Open Data for Research.PoS, ICHEP2024:1172, 2025
work page 2025
-
[26]
Timo Saala and Matthias Schott. Introduction to the usage of open data from the Large Hadron Collider for computer scientists in the context of machine learning.SciPost Phys. Lect. Notes, 96:1, 2025
work page 2025
-
[27]
Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands. A Brief Introduction to PYTHIA 8.1.Comput. Phys. Commun., 178:852–867, 2008
work page 2008
-
[28]
Georges Aad et al. Measurement of off-shell Higgs boson production in theH ∗→ZZ→4ℓdecay channel using a neural simulation-based inference technique in 13TeV pp collisions with the ATLAS detector.Rept. Prog. Phys., 88(5):057803, 2025
work page 2025
-
[29]
Vladimir Chekhovsky et al. Combination and interpretation of differential Higgs boson production cross sections in proton-proton collisions at√s= 13 TeV. 4 2025
work page 2025
-
[30]
Observation of a pseudoscalar excess at the top quark pair production threshold
Aram Hayrapetyan et al. Observation of a pseudoscalar excess at the top quark pair production threshold. Rept. Prog. Phys., 88(8):087801, 2025
work page 2025
-
[31]
Georges Aad et al. Search for same-charge top-quark pair production in pp collisions at√s = 13 TeV with the ATLAS detector.JHEP, 02:084, 2025
work page 2025
-
[32]
Michael Andrews et al. End-to-end jet classification of boosted top quarks with the CMS open data.EPJ Web Conf., 251:04030, 2021
work page 2021
-
[33]
Aram Hayrapetyan et al. Search for pair production of heavy particles decaying to a top quark and a gluon in the lepton+jets final state in proton-proton collisions at√s= 13 TeV.Eur. Phys. J. C, 85(3):342, 2025
work page 2025
-
[34]
Georges Aad et al. Search for short- and long-lived axion-like particles inH→aa→4γdecays with the ATLAS experiment at the LHC.Eur. Phys. J. C, 84(7):742, 2024
work page 2024
-
[35]
Quark-gluon Jet Discrimination At CMS
Tom Cornelis. Quark-gluon Jet Discrimination At CMS. In2nd Large Hadron Collider Physics Conference, 9 2014
work page 2014
-
[36]
M. Andrews, J. Alison, S. An, Patrick Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai. End-to-end jet classification of quarks and gluons with the CMS Open Data.Nucl. Instrum. Meth. A, 977:164304, 2020
work page 2020
-
[37]
DeepMET: Improving missing transverse momentum estimation with a deep neural network
Aram Hayrapetyan et al. DeepMET: Improving missing transverse momentum estimation with a deep neural network. 9 2025
work page 2025
-
[38]
Georges Aad et al. The performance of missing transverse momentum reconstruction and its significance with the ATLAS detector using 140 fb−1 of√s= 13TeV pp collisions.Eur. Phys. J. C, 85(6):606, 2025
work page 2025
-
[39]
Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott
Benedikt Maier, Siddharth M. Narayanan, Gianfranco de Castro, Maxim Goncharov, Christoph Paus, and Matthias Schott. Pile-up mitigation using attention.Mach. Learn. Sci. Tech., 3(2):025012, 2022
work page 2022
-
[40]
Neural Scaling Laws for Boosted Jet Tagging
Matthias Vigl, Nicole Hartman, Michael Kagan, and Lukas Heinrich. Neural Scaling Laws for Boosted Jet Tagging. 2 2026
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.