One Generator, Any Process: LLM-Conditioning for the LHC

Daniel Schiller; Henning Bahl; Thanush Sivagnanalingam; Tilman Plehn

arxiv: 2606.23791 · v1 · pith:MH7DT7B5new · submitted 2026-06-22 · ✦ hep-ph

One Generator, Any Process: LLM-Conditioning for the LHC

Henning Bahl , Tilman Plehn , Daniel Schiller , Thanush Sivagnanalingam This is my paper

Pith reviewed 2026-06-26 07:52 UTC · model grok-4.3

classification ✦ hep-ph

keywords LHC event generationLLM conditioningautoregressive transformergenerative networksphysics simulationFeynman diagramsmulti-process generation

0 comments

The pith

Pre-trained LLMs supply conditioning embeddings that let one autoregressive network generate events for any LHC process.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether embeddings taken from general-purpose pre-trained language models can serve as conditioning inputs for a transformer that generates LHC collision events. The embeddings encode process labels, continuous parameters, and Feynman diagrams so that high-level patterns shared across processes become available to the generator without extra fine-tuning. If the approach holds, a single network reaches usable performance after fewer training steps, produces events closer to reference distributions, and works on processes never shown during training. Readers care because standard generators are built and trained separately for each new process, which multiplies the computational cost of exploring new physics scenarios at the LHC.

Core claim

Conditioning an autoregressive transformer with embeddings from pre-trained LLMs for continuous parameters, process labels, and Feynman diagrams makes the generative network converge faster, match target distributions more closely, and produce valid events for processes absent from its training data.

What carries the argument

Descriptive embeddings from pre-trained LLMs that encode physics details and are fed as conditioning signals to the autoregressive transformer generator.

If this is right

Training time drops because shared high-level patterns are supplied by the embeddings rather than learned from scratch.
Event samples achieve higher fidelity to reference distributions across multiple processes.
A trained model can be applied directly to new processes without retraining or architecture changes.
The same conditioning pipeline works for parameter scans, label switches, and diagram inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could reduce the need to maintain separate generator instances for each analysis channel at the LHC.
If the embeddings capture process structure, the same scheme might transfer to other simulation domains that use diagram or label inputs.

Load-bearing premise

Embeddings produced by general-purpose pre-trained LLMs already contain enough transferable physics structure to improve convergence and allow generalization without any domain-specific retraining of the language model.

What would settle it

Train identical generators with and without LLM embeddings on the same set of processes, then evaluate both on a process completely withheld from training; if the LLM-conditioned version shows no gain in sample quality or training speed, the central claim is false.

Figures

Figures reproduced from arXiv: 2606.23791 by Daniel Schiller, Henning Bahl, Thanush Sivagnanalingam, Tilman Plehn.

**Figure 2.** Figure 2: Left: Drell-Yan invariant mass distribution for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Predicted Z mass, extracted from the generated invariant mass peak, as a function of the true Z mass. The shaded regions indicate the in-training masses. The uncertainty bars indicate the standard deviation over five independent runs. case of text input, we test the Qwen3 model with 2B parameters in addition. Each conditioning scheme is trained independently five times. Both LLMs are partially fine-tuned, … view at source ↗

**Figure 4.** Figure 4: Loss comparison over 20 epochs for different conditioning mechanisms. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: AUCs for different conditioning schemes. The dashed line indicates the [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Kinematics for uu¯ → t¯tH (in-training) and g g → t¯tH (hold-out) production. We show the rapidity of the t¯tH system (left) and mt¯t (right). The histograms depict the mean and standard deviation of 5 independent runs. In the lower panels we also show the uu¯ → t¯t distributions for comparison. of [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: AUCs for different conditioning schemes trained for 120 epochs. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: AUCs for different conditioning schemes for a non-LLM transformer back [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Classifier AUC for g g → t¯tH as a function of the number of finetuning events, for the one-hot and edge-list conditioning compared to training from scratch. The left panel shows the zero-shot AUC of the pretrained networks. Lines and bands are the mean and standard deviation over five independent runs. batch size and the number of epochs fixed and use a cosine-annealing learning-rate schedule, such that a… view at source ↗

**Figure 10.** Figure 10: Rapidity of the t¯tH system (left) and mt¯t (right) for g g → t¯tH, comparing the edge-list network finetuned on 2048 events to the network trained from scratch on 2048 and 10000 events. Histograms show the mean and standard deviation over five independent runs, with the ratio to the truth in the lower panels. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Classifier AUC versus number of finetuning events for high-multiplicity [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Classifier AUC versus number of finetuning events for [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Intermediate mbW+ for in-training uu¯ → b¯tW+ (left) and hold-out g g → b¯tW+ (right), the latter finetuned on 2048 events. Green and orange lines denote short and long pretraining, black is the truth. Bands show the run-to-run standard deviation, or Poisson uncertainties for the single long-pretraining run. but the network has only learned to generate the multiplicities seen during pretraining. After fin… view at source ↗

**Figure 14.** Figure 14: Exemplary Feynman diagram image used as input. [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Left: Drell-Yan invariant mass distribution. Right: same for the positron [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

**Figure 16.** Figure 16: Drell-Yan invariant mass (left) and positron transverse momentum (right) [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of the AUCs of different conditioning schemes for the 2 [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

**Figure 18.** Figure 18: Comparison of the AUCs of different conditioning schemes for the 2 [PITH_FULL_IMAGE:figures/full_fig_p026_18.png] view at source ↗

**Figure 19.** Figure 19: AUCs for the different input conditioning schemes using the Qwen3 back [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗

**Figure 20.** Figure 20: Overview of individual train process AUCs for the different input repre [PITH_FULL_IMAGE:figures/full_fig_p028_20.png] view at source ↗

**Figure 21.** Figure 21: Overview of individual hold-out process AUCs for the different input [PITH_FULL_IMAGE:figures/full_fig_p029_21.png] view at source ↗

**Figure 22.** Figure 22: Validation loss over 120 epochs for the in-interm-out and edge-list condi [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗

**Figure 23.** Figure 23: Invariant mass mbW+ reconstructing the intermediate top resonance, for the in-training uu¯ → b¯tW+ (left) and the hold-out g g → b¯tW+ (right), the latter shown pretrained (zero-shot) and finetuned on 2048 and 10000 events. Line colour denotes the pretraining length (short/long) and the finetuning dataset size, as in the legend, black denotes the truth. Bands show the run-to-run standard deviation, or Poi… view at source ↗

read the original abstract

Neural network training for LHC event generation should, ideally, benefit from common high-level patterns in different processes. We propose novel conditioning schemes for continuous parameters, process labels, and Feynman diagrams. We employ pre-trained LLMs as multi-modal foundation models to provide descriptive embeddings for an autoregressive transformer. With such high-level physics-inductive bias the generative networks converge faster, provide better result, and generalize to unseen processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches LLM embeddings to condition a single autoregressive generator across LHC processes, but the abstract supplies no results or implementation details to check whether the claimed gains actually appear.

read the letter

The core idea is to embed continuous parameters, process labels, and Feynman diagrams via a pre-trained LLM and feed those into an autoregressive transformer so one network can handle many processes. The authors argue this high-level conditioning will speed convergence, improve samples, and let the model generalize to unseen processes.

What is new is the specific multi-modal conditioning setup that mixes LLM embeddings with the generator architecture for multi-process LHC simulation. The motivation to exploit shared structure across processes is sensible and aligns with existing work on conditional generation in HEP.

The paper states the approach clearly in the abstract. Beyond that, there is nothing to evaluate. No equations, no training procedure, no loss functions, no ablation studies, and no quantitative metrics on convergence speed or sample quality are given. The central claim therefore rests entirely on the untested assumption that unmodified general-purpose LLM embeddings carry transferable physics structure rather than mainly linguistic patterns. If that assumption fails, any observed improvements would likely trace to the conditioning architecture itself, not the claimed inductive bias.

This is for readers already working on machine-learning event generators who want to see new conditioning directions. It could prompt useful discussion in a reading group, but the absence of any supporting data makes it premature to cite or to treat as a working method. A serious editor should send it to review only after the authors supply the full methods, training details, and results that let referees judge whether the LLM embeddings deliver the promised physics bias.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes using pre-trained LLMs as multi-modal foundation models to generate descriptive embeddings that condition an autoregressive transformer on continuous parameters, process labels, and Feynman diagrams for LHC event generation. The central claim is that this high-level physics-inductive bias enables faster convergence, better results, and generalization to unseen processes.

Significance. If the claims are validated with quantitative evidence, the work could be significant for LHC phenomenology by enabling a single generative model to handle arbitrary processes, potentially streamlining Monte Carlo simulations and reducing the need for process-specific training. Leveraging unmodified general-purpose LLMs for physics conditioning would represent an innovative transfer-learning direction if the inductive bias proves transferable.

major comments (1)

[Abstract] Abstract: The claims that 'the generative networks converge faster, provide better result, and generalize to unseen processes' with 'high-level physics-inductive bias' from LLM embeddings are presented without any quantitative metrics, ablation studies, training details, or performance comparisons. This absence is load-bearing, as it prevents assessment of whether improvements arise from the claimed physics bias or from the conditioning architecture itself.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract: The claims that 'the generative networks converge faster, provide better result, and generalize to unseen processes' with 'high-level physics-inductive bias' from LLM embeddings are presented without any quantitative metrics, ablation studies, training details, or performance comparisons. This absence is load-bearing, as it prevents assessment of whether improvements arise from the claimed physics bias or from the conditioning architecture itself.

Authors: We agree that the abstract states the central claims at a high level without quantitative support. The body of the manuscript contains the relevant metrics, ablation studies, training details, and comparisons. To address the concern directly, we will revise the abstract to incorporate key quantitative results (e.g., convergence speed, quality metrics, and generalization performance on held-out processes) so that the claims are anchored by evidence already present in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external benchmarks

full rationale

The paper proposes conditioning an autoregressive transformer with embeddings from unmodified pre-trained LLMs for LHC event generation. No equations, fitted parameters, or self-referential definitions appear in the abstract or described claims. The asserted gains in convergence, sample quality, and zero-shot generalization are presented as empirical outcomes of experiments rather than quantities defined by construction from the inputs. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing steps. The derivation chain is therefore self-contained against external validation and receives the default non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unstated assumption that LLM embeddings carry useful physics structure.

pith-pipeline@v0.9.1-grok · 5593 in / 983 out tokens · 17557 ms · 2026-06-26T07:52:06.153298+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

296 extracted references · 136 canonical work pages · 33 internal anchors

[1]

Point cloud transformers applied to collider physics

Mikuni, Vinicius and Canelli, Florencia. Point cloud transformers applied to collider physics. Mach. Learn. Sci. Tech. 2021. doi:10.1088/2632-2153/ac07f6. arXiv:2102.05073

work page doi:10.1088/2632-2153/ac07f6 2021
[2]

Reconstructing particles in jets using set transformer and hypergraph prediction networks

Di Bello, Francesco Armando and others. Reconstructing particles in jets using set transformer and hypergraph prediction networks. Eur. Phys. J. C. 2023. doi:10.1140/epjc/s10052-023-11677-7. arXiv:2212.01328

work page doi:10.1140/epjc/s10052-023-11677-7 2023
[3]

Learning the language of QCD jets with transformers

Finke, Thorben and Kr. Learning the language of QCD jets with transformers. JHEP. 2023. doi:10.1007/JHEP06(2023)184. arXiv:2303.07364

work page doi:10.1007/jhep06(2023)184 2023
[4]

and Metodiev, Eric M

Komiske, Patrick T. and Metodiev, Eric M. and Thaler, Jesse. Energy Flow Networks: Deep Sets for Particle Jets. JHEP. 2019. doi:10.1007/JHEP01(2019)121. arXiv:1810.05165

work page doi:10.1007/jhep01(2019)121 2019
[5]

ParticleNet: Jet Tagging via Particle Clouds

Qu, Huilin and Gouskos, Loukas. ParticleNet: Jet Tagging via Particle Clouds. Phys. Rev. D. 2020. doi:10.1103/PhysRevD.101.056019. arXiv:1902.08570

work page doi:10.1103/physrevd.101.056019 2020
[6]

Hydrogen reionization ends by z = 5.3: Lyman-alpha optical depth measured by the XQR-30 sample , volume=

Bosman, Sarah E I and Davies, Frederick B and Becker, George D and Keating, Laura C and Davies, Rebecca L and Zhu, Yongda and Eilers, Anna-Christina and D’Odorico, Valentina and Bian, Fuyan and Bischetti, Manuela and Cristiani, Stefano V and Fan, Xiaohui and Farina, Emanuele P and Haehnelt, Martin G and Hennawi, Joseph F and Kulkarni, Girish and Mesinger,...

work page doi:10.1093/mnras/stac1046
[7]

Spina, Benedetta and Bosman, Sarah E. I. and Davies, Frederick B. and Gaikwad, Prakash and Zhu, Yongda , year=. Damping wings in the Lyman- forest: A model-independent measurement of the neutral fraction at 5.4 < z < 6.1 , volume=. doi:10.1051/0004-6361/202450798 , journal=

work page doi:10.1051/0004-6361/202450798
[8]

and Spannowsky, Michael

Bhardwaj, Akanksha and Englert, Christoph and Naskar, Wrishik and Ngairangbam, Vishal S. and Spannowsky, Michael. Equivariant, safe and sensitive graph networks for new physics. JHEP. 2024. doi:10.1007/JHEP07(2024)245. arXiv:2402.12449

work page doi:10.1007/jhep07(2024)245 2024
[9]

uggen, Marcus , title =

Neutsch, Steffen and Heneka, Caroline and Br\"uggen, Marcus , title = ". Mon. Not. Roy. Astron. Soc. 2022. doi:10.1093/mnras/stac218. arXiv:2201.07587

work page doi:10.1093/mnras/stac218 2022
[10]

The frontier of simulation-based inference , volume =

Cranmer, Kyle and Brehmer, Johann and Louppe, Gilles. The frontier of simulation-based inference. Proc. Nat. Acad. Sci. 2020. doi:10.1073/pnas.1912789117. arXiv:1911.01429

work page doi:10.1073/pnas.1912789117 2020
[11]

Machine Learning and Cosmology

Dvorkin, Cora and others. Machine Learning and Cosmology. Snowmass 2021. 2022. arXiv:2203.08056

arXiv 2021
[12]

Enhancing Gravitational-Wave Science with Machine Learning

Cuoco, Elena and others. Enhancing Gravitational-Wave Science with Machine Learning. Mach. Learn. Sci. Tech. 2021. doi:10.1088/2632-2153/abb93a. arXiv:2005.03745

work page doi:10.1088/2632-2153/abb93a 2021
[13]

RAS Techniques and Instruments , volume =

Slijepcevic, Inigo V and Scaife, Anna M M and Walmsley, Mike and Bowles, Micah and Wong, O Ivy and Shabala, Stanislav S and White, Sarah V , title =. RAS Techniques and Instruments , volume =. 2023 , month =. doi:10.1093/rasti/rzad055 , url =

work page doi:10.1093/rasti/rzad055 2023
[14]

Monthly Notices of the Royal Astronomical Society , volume =

Parker, Liam and Lanusse, Francois and Golkar, Siavash and Sarra, Leopoldo and Cranmer, Miles and Bietti, Alberto and Eickenberg, Michael and Krawezik, Geraud and McCabe, Michael and Morel, Rudy and Ohana, Ruben and Pettee, Mariel and Régaldo-Saint Blancard, Bruno and Cho, Kyunghyun and Ho, Shirley and The Polymathic AI Collaboration , title =. Monthly No...

work page doi:10.1093/mnras/stae1450 2024
[15]

and Mishra-Sharma, Siddharth and Villar, V

Zhang, Gemma and Helfer, Thomas and Gagliano, Alexander T. and Mishra-Sharma, Siddharth and Villar, V. Ashley. Maven: a multimodal foundation model for supernova science. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad990d. arXiv:2408.16829

work page doi:10.1088/2632-2153/ad990d 2024
[16]

and Roussi, Marwah and Miller, David W

Bogatskiy, Alexander and Anderson, Brandon and Offermann, Jan T. and Roussi, Marwah and Miller, David W. and Kondor, Risi. Lorentz Group Equivariant Neural Network for Particle Physics. 2020. arXiv:2006.04780

arXiv 2020
[17]

uller, David I. and Schuh, Daniel , title =

Favoni, Matteo and Ipp, Andreas and M\"uller, David I. and Schuh, Daniel , title = ". Phys. Rev. Lett. 2022. doi:10.1103/PhysRevLett.128.032003. arXiv:2012.12901

work page doi:10.1103/physrevlett.128.032003 2022
[18]

uller, David I. and Schuh, Daniel , title =

Bulusu, Srinath and Favoni, Matteo and Ipp, Andreas and M\"uller, David I. and Schuh, Daniel , title = ". EPJ Web Conf. 2022. doi:10.1051/epjconf/202225809001. arXiv:2112.12493

work page doi:10.1051/epjconf/202225809001 2022
[19]

An efficient Lorentz equivariant graph neural network for jet tagging

Gong, Shiqi and Meng, Qi and Zhang, Jue and Qu, Huilin and Li, Congqiao and Qian, Sitian and Du, Weitao and Ma, Zhi-Ming and Liu, Tie-Yan. An efficient Lorentz equivariant graph neural network for jet tagging. JHEP. 2022. doi:10.1007/JHEP07(2022)030. arXiv:2201.08187

work page doi:10.1007/jhep07(2022)030 2022
[20]

uller, David I. , title =

Favoni, Matteo and Ipp, Andreas and M\"uller, David I. , title = ". EPJ Web Conf. 2022. doi:10.1051/epjconf/202227409001. arXiv:2212.00832

work page doi:10.1051/epjconf/202227409001 2022
[21]

Symmetry Group Equivariant Architectures for Physics

Bogatskiy, Alexander and others. Symmetry Group Equivariant Architectures for Physics. Snowmass 2021. 2022. arXiv:2203.06153

arXiv 2021
[22]

and Offermann, Jan T

Bogatskiy, Alexander and Hoffman, Timothy and Miller, David W. and Offermann, Jan T. PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics. 2022. arXiv:2211.00454

arXiv 2022
[23]

Equivariant Graph Neural Networks for Charged Particle Tracking

Murnane, Daniel and Thais, Savannah and Thete, Ameya. Equivariant Graph Neural Networks for Charged Particle Tracking. 21th International Workshop on Advanced Computing and Analysis Techniques in Physics Research : AI meets Reality. 2023. arXiv:2304.05293

arXiv 2023
[24]

Learning broken symmetries with approximate invariance

Nabat, Seth and Ghosh, Aishik and Witkowski, Edmund and Kasieczka, Gregor and Whiteson, Daniel. Learning broken symmetries with approximate invariance. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.072002. arXiv:2412.18773

work page doi:10.1103/physrevd.111.072002 2025
[25]

and Offermann, Jan T

Bogatskiy, Alexander and Hoffman, Timothy and Miller, David W. and Offermann, Jan T. and Liu, Xiaoyang. Explainable equivariant neural networks for particle physics: PELICAN. JHEP. 2024. doi:10.1007/JHEP03(2024)113. arXiv:2307.16506

work page doi:10.1007/jhep03(2024)113 2024
[26]

and Spannowsky, Michael

Ma\^ tre, Daniel and Ngairangbam, Vishal S. and Spannowsky, Michael. Optimal equivariant architectures from the symmetries of matrix-element likelihoods. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/adbab1. arXiv:2410.18553

work page doi:10.1088/2632-2153/adbab1 2025
[27]

and Hallin, Anna and Kasieczka, Gregor and Kr

Amram, Oz and Anzalone, Luca and Birk, Joschka and Faroughy, Darius A. and Hallin, Anna and Kasieczka, Gregor and Kr. Aspen Open Jets: unlocking LHC data for foundation models in particle physics. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/ade58f. arXiv:2412.10504

work page doi:10.1088/2632-2153/ade58f 2025
[28]

2023 , eprint=

LIMA: Less Is More for Alignment , author=. 2023 , eprint=

2023
[29]

RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=. 2024 , eprint=

2024
[30]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

2021
[31]

2024 , eprint=

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs , author=. 2024 , eprint=

2024
[32]

Terry , journal =

Ralph Allan Bradley and Milton E. Terry , journal =. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , urldate =
[33]

2024 , eprint=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. 2024 , eprint=

2024
[34]

2024 , eprint=

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training , author=. 2024 , eprint=

2024
[35]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020
[36]

2022 , eprint=

Emergent Abilities of Large Language Models , author=. 2022 , eprint=

2022
[37]

Permutationless many-jet event reconstruction with symmetry preserving attention networks

Fenton, Michael James and Shmakov, Alexander and Ho, Ta-Wei and Hsu, Shih-Chieh and Whiteson, Daniel and Baldi, Pierre. Permutationless many-jet event reconstruction with symmetry preserving attention networks. Phys. Rev. D. 2022. doi:10.1103/PhysRevD.105.112008. arXiv:2010.09206

work page doi:10.1103/physrevd.105.112008 2022
[38]

ABCNet: An attention-based method for particle tagging

Mikuni, Vinicius and Canelli, Florencia. ABCNet: An attention-based method for particle tagging. Eur. Phys. J. Plus. 2020. doi:10.1140/epjp/s13360-020-00497-3. arXiv:2001.05311

work page doi:10.1140/epjp/s13360-020-00497-3 2020
[39]

Particle Transformer for Jet Tagging

Qu, Huilin and Li, Congqiao and Qian, Sitian. Particle Transformer for Jet Tagging. 2022. arXiv:2202.03772

arXiv 2022
[40]

Automated Approach to Accurate, Precise, and Fast Detector Simulation and Reconstruction

Dreyer, Etienne and Gross, Eilam and Kobylianskii, Dmitrii and Mikuni, Vinicius and Nachman, Benjamin and Soybelman, Nathalie. Automated Approach to Accurate, Precise, and Fast Detector Simulation and Reconstruction. Phys. Rev. Lett. 2024. doi:10.1103/PhysRevLett.133.211902. arXiv:2406.01620

work page doi:10.1103/physrevlett.133.211902 2024
[41]

Generating variable length full events from partons

Qu\'etant, Guillaume and Raine, John Andrew and Leigh, Matthew and Sengupta, Debajyoti and Golling, Tobias. Generating variable length full events from partons. Phys. Rev. D. 2024. doi:10.1103/PhysRevD.110.076023. arXiv:2406.13074

work page doi:10.1103/physrevd.110.076023 2024
[42]

Generating particle physics Lagrangians with transformers

Koay, Yong Sheng and Enberg, Rikard and Moretti, Stefano and Camargo-Molina, Eliel. Generating particle physics Lagrangians with transformers. 2025. arXiv:2501.09729

arXiv 2025
[43]

and Zhang, Xiaoyuan

Dersy, Aur\'elien and Schwartz, Matthew D. and Zhang, Xiaoyuan. Simplifying Polylogarithms with Machine Learning. Int. J. Data Sci. Math. Sci. 2024. doi:10.1142/S2810939223500028. arXiv:2206.04115

work page doi:10.1142/s2810939223500028 2024
[44]

Learning the simplicity of scattering amplitudes

Cheung, Clifford and Dersy, Aur\'elien and Schwartz, Matthew D. Learning the simplicity of scattering amplitudes. SciPost Phys. 2025. doi:10.21468/SciPostPhys.18.2.040. arXiv:2408.04720

work page doi:10.21468/scipostphys.18.2.040 2025
[45]

OmniJet- _ C : Learning point cloud calorimeter simulations using generative transformers

Birk, Joschka and Gaede, Frank and Hallin, Anna and Kasieczka, Gregor and Mozzanica, Martina and Rose, Henning. OmniJet- _ C : Learning point cloud calorimeter simulations using generative transformers. 2025. arXiv:2501.05534

arXiv 2025
[46]

HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Bardhan, Jai and Agrawal, Radhikesh and Tilak, Abhiram and Neeraj, Cyrin and Mitra, Subhadip. HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture. 2025. arXiv:2502.03933

arXiv 2025
[47]

and Rodgers, Jack P

Wildridge, Andrew J. and Rodgers, Jack P. and Colbert, Ethan M. and yao, Yao and Jung, Andreas W. and Liu, Miaoyuan. Bumblebee: Foundation Model for Particle Physics Discovery. 38th conference on Neural Information Processing Systems. 2024. arXiv:2412.07867

arXiv 2024
[48]

Solving key challenges in collider physics with foundation models

Mikuni, Vinicius and Nachman, Benjamin. Solving key challenges in collider physics with foundation models. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.L051504. arXiv:2404.16091

work page doi:10.1103/physrevd.111.l051504 2025
[49]

Resimulation-based self-supervised learning for pretraining physics foundation models

Harris, Philip and Krupa, Jeffrey and Kagan, Michael and Maier, Benedikt and Woodward, Nathaniel. Resimulation-based self-supervised learning for pretraining physics foundation models. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.032010. arXiv:2403.07066

work page doi:10.1103/physrevd.111.032010 2025
[50]

OmniJet- : the first cross-task foundation model for particle physics

Birk, Joschka and Hallin, Anna and Kasieczka, Gregor. OmniJet- : the first cross-task foundation model for particle physics. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad66ad. arXiv:2403.05618

work page doi:10.1088/2632-2153/ad66ad 2024
[51]

Physics event classification using Large Language Models

Fanelli, Cristiano and Giroux, James and Moran, Patrick and Nayak, Hemalata and Suresh, Karthik and Walter, Eric. Physics event classification using Large Language Models. JINST. 2024. doi:10.1088/1748-0221/19/07/C07011. arXiv:2404.05752

work page doi:10.1088/1748-0221/19/07/c07011 2024
[52]

Masked particle modeling on sets: towards self-supervised high energy physics foundation models

Golling, Tobias and Heinrich, Lukas and Kagan, Michael and Klein, Samuel and Leigh, Matthew and Osadchy, Margarita and Raine, John Andrew. Masked particle modeling on sets: towards self-supervised high energy physics foundation models. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad64a8. arXiv:2401.13537

work page doi:10.1088/2632-2153/ad64a8 2024
[53]

Is Tokenization Needed for Masked Particle Modelling?

Leigh, Matthew and Klein, Samuel and Charton, Fran c ois and Golling, Tobias and Heinrich, Lukas and Kagan, Michael and Ochoa, In\^es and Osadchy, Margarita. Is Tokenization Needed for Masked Particle Modelling?. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/addb98. arXiv:2409.12589

work page doi:10.1088/2632-2153/addb98 2025
[54]

Finetuning foundation models for joint analysis optimization in High Energy Physics

Vigl, Matthias and Hartman, Nicole and Heinrich, Lukas. Finetuning foundation models for joint analysis optimization in High Energy Physics. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad55a3. arXiv:2401.13536

work page doi:10.1088/2632-2153/ad55a3 2024
[55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Assran, Mahmoud and Duval, Quentin and Misra, Ishan and Bojanowski, Piotr and Vincent, Pascal and Rabbat, Michael and LeCun, Yann and Ballas, Nicolas , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2023 , pages =

2023
[56]

International Conference on Learning Representations , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=
[57]

2024 , eprint=

Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT , author=. 2024 , eprint=

2024
[58]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

2023
[59]

2020 , eprint=

Language Models are Few-Shot Learners , author=. 2020 , eprint=

2020
[60]

2023 , eprint=

Large Language Models are Zero-Shot Reasoners , author=. 2023 , eprint=

2023
[61]

The Impact of AI in Physics Education: A Comprehensive Review from GCSE to University Levels The Impact of AI in Physics Education , doi =

Yeadon, Will and Hardy, Tom , year =. The Impact of AI in Physics Education: A Comprehensive Review from GCSE to University Levels The Impact of AI in Physics Education , doi =
[62]

2023 , eprint=

Physics simulation capabilities of LLMs , author=. 2023 , eprint=

2023
[63]

2024 , eprint=

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics , author=. 2024 , eprint=

2024
[64]

doi:10.48550/arXiv.2309.04533 , archivePrefix =

Unveiling Dark Matter free-streaming at the smallest scales with high redshift Lyman-alpha forest. doi:10.48550/arXiv.2309.04533 , archivePrefix =. 2309.04533 , primaryClass =

work page doi:10.48550/arxiv.2309.04533
[65]

, keywords =

New constraints on warm dark matter from the Lyman- forest power spectrum. , keywords =. doi:10.1103/PhysRevD.108.023502 , archivePrefix =. 2209.14220 , primaryClass =

work page doi:10.1103/physrevd.108.023502
[66]

High Mass X-ray Binaries and the Cosmic 21-cm Signal: Impact of Host Galaxy Absorption

High-mass X-ray binaries and the cosmic 21-cm signal: impact of host galaxy absorption. , keywords =. doi:10.1093/mnras/stx943 , archivePrefix =. 1702.00409 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stx943
[67]

, keywords =

Cosmology with One Galaxy?. , keywords =. doi:10.3847/1538-4357/ac5d3f , archivePrefix =. 2201.02202 , primaryClass =

work page doi:10.3847/1538-4357/ac5d3f
[68]

Radio Galaxy Zoo: Compact and extended radio source classification with deep learning

Radio Galaxy Zoo: compact and extended radio source classification with deep learning. , keywords =. doi:10.1093/mnras/sty163 , archivePrefix =. 1801.04861 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/sty163
[69]

ML4Astro International Conference , pages=

Deep Learning 21 cm Lightcones in 3D , author=. ML4Astro International Conference , pages=. 2022 , organization=. doi:10.48550/arXiv.2311.17553 , archivePrefix =. 2311.17553 , primaryClass =

work page doi:10.48550/arxiv.2311.17553 2022
[70]

, keywords =

Quantifying uncertainty in deep learning approaches to radio galaxy classification. , keywords =. doi:10.1093/mnras/stac223 , archivePrefix =. 2201.01203 , primaryClass =

work page doi:10.1093/mnras/stac223
[71]

Galaxy Spectra neural Network (GaSNet). II. Using Deep Learning for Spectral Classification and Redshift Predictions. doi:10.48550/arXiv.2311.04146 , archivePrefix =. 2311.04146 , primaryClass =

work page doi:10.48550/arxiv.2311.04146
[72]

doi:10.48550/arXiv.2310.02684 , archivePrefix =

The LoReLi database: 21 cm signal inference with 3D radiative hydrodynamics simulations. doi:10.48550/arXiv.2310.02684 , archivePrefix =. 2310.02684 , primaryClass =

work page doi:10.48550/arxiv.2310.02684
[73]

Reports on Progress in Physics , keywords =

Machine learning for observational cosmology. Reports on Progress in Physics , keywords =. doi:10.1088/1361-6633/acd2ea , archivePrefix =. 2303.15794 , primaryClass =

work page doi:10.1088/1361-6633/acd2ea
[74]

doi:10.48550/arXiv.2203.08056 , archivePrefix =

Machine Learning and Cosmology. doi:10.48550/arXiv.2203.08056 , archivePrefix =. 2203.08056 , primaryClass =

work page doi:10.48550/arxiv.2203.08056
[75]

Simulating the 21-cm signal from reionisation including non-linear ionisations and inhomogeneous recombinations

Simulating the 21 cm signal from reionization including non-linear ionizations and inhomogeneous recombinations. , keywords =. doi:10.1093/mnras/stv3001 , archivePrefix =. 1510.04280 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stv3001
[76]

, keywords =

Astraeus I: the interplay between galaxy formation and reionization. , keywords =. doi:10.1093/mnras/stab602 , archivePrefix =. 2004.08401 , primaryClass =

work page doi:10.1093/mnras/stab602 2004
[77]

doi:10.48550/arXiv.2310.17602 , archivePrefix =

Simulation-based Inference of Reionization Parameters from 3D Tomographic 21 cm Light-cone Images -- II: Application of Solid Harmonic Wavelet Scattering Transform. doi:10.48550/arXiv.2310.17602 , archivePrefix =. 2310.17602 , primaryClass =

work page doi:10.48550/arxiv.2310.17602
[78]

The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys

Huertas-Company, Marc and Lanusse, Fran c ois. The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys. Publ. Astron. Soc. Austral. 2023. doi:10.1017/pasa.2022.55. arXiv:2210.01813

work page doi:10.1017/pasa.2022.55 2023
[79]

2020 , eprint=

BayesFlow: Learning complex stochastic models with invertible neural networks , author=. 2020 , eprint=

2020
[80]

2023 , eprint=

BayesFlow: Amortized Bayesian Workflows With Neural Networks , author=. 2023 , eprint=

2023

Showing first 80 references.

[1] [1]

Point cloud transformers applied to collider physics

Mikuni, Vinicius and Canelli, Florencia. Point cloud transformers applied to collider physics. Mach. Learn. Sci. Tech. 2021. doi:10.1088/2632-2153/ac07f6. arXiv:2102.05073

work page doi:10.1088/2632-2153/ac07f6 2021

[2] [2]

Reconstructing particles in jets using set transformer and hypergraph prediction networks

Di Bello, Francesco Armando and others. Reconstructing particles in jets using set transformer and hypergraph prediction networks. Eur. Phys. J. C. 2023. doi:10.1140/epjc/s10052-023-11677-7. arXiv:2212.01328

work page doi:10.1140/epjc/s10052-023-11677-7 2023

[3] [3]

Learning the language of QCD jets with transformers

Finke, Thorben and Kr. Learning the language of QCD jets with transformers. JHEP. 2023. doi:10.1007/JHEP06(2023)184. arXiv:2303.07364

work page doi:10.1007/jhep06(2023)184 2023

[4] [4]

and Metodiev, Eric M

Komiske, Patrick T. and Metodiev, Eric M. and Thaler, Jesse. Energy Flow Networks: Deep Sets for Particle Jets. JHEP. 2019. doi:10.1007/JHEP01(2019)121. arXiv:1810.05165

work page doi:10.1007/jhep01(2019)121 2019

[5] [5]

ParticleNet: Jet Tagging via Particle Clouds

Qu, Huilin and Gouskos, Loukas. ParticleNet: Jet Tagging via Particle Clouds. Phys. Rev. D. 2020. doi:10.1103/PhysRevD.101.056019. arXiv:1902.08570

work page doi:10.1103/physrevd.101.056019 2020

[6] [6]

Hydrogen reionization ends by z = 5.3: Lyman-alpha optical depth measured by the XQR-30 sample , volume=

Bosman, Sarah E I and Davies, Frederick B and Becker, George D and Keating, Laura C and Davies, Rebecca L and Zhu, Yongda and Eilers, Anna-Christina and D’Odorico, Valentina and Bian, Fuyan and Bischetti, Manuela and Cristiani, Stefano V and Fan, Xiaohui and Farina, Emanuele P and Haehnelt, Martin G and Hennawi, Joseph F and Kulkarni, Girish and Mesinger,...

work page doi:10.1093/mnras/stac1046

[7] [7]

Spina, Benedetta and Bosman, Sarah E. I. and Davies, Frederick B. and Gaikwad, Prakash and Zhu, Yongda , year=. Damping wings in the Lyman- forest: A model-independent measurement of the neutral fraction at 5.4 < z < 6.1 , volume=. doi:10.1051/0004-6361/202450798 , journal=

work page doi:10.1051/0004-6361/202450798

[8] [8]

and Spannowsky, Michael

Bhardwaj, Akanksha and Englert, Christoph and Naskar, Wrishik and Ngairangbam, Vishal S. and Spannowsky, Michael. Equivariant, safe and sensitive graph networks for new physics. JHEP. 2024. doi:10.1007/JHEP07(2024)245. arXiv:2402.12449

work page doi:10.1007/jhep07(2024)245 2024

[9] [9]

uggen, Marcus , title =

Neutsch, Steffen and Heneka, Caroline and Br\"uggen, Marcus , title = ". Mon. Not. Roy. Astron. Soc. 2022. doi:10.1093/mnras/stac218. arXiv:2201.07587

work page doi:10.1093/mnras/stac218 2022

[10] [10]

The frontier of simulation-based inference , volume =

Cranmer, Kyle and Brehmer, Johann and Louppe, Gilles. The frontier of simulation-based inference. Proc. Nat. Acad. Sci. 2020. doi:10.1073/pnas.1912789117. arXiv:1911.01429

work page doi:10.1073/pnas.1912789117 2020

[11] [11]

Machine Learning and Cosmology

Dvorkin, Cora and others. Machine Learning and Cosmology. Snowmass 2021. 2022. arXiv:2203.08056

arXiv 2021

[12] [12]

Enhancing Gravitational-Wave Science with Machine Learning

Cuoco, Elena and others. Enhancing Gravitational-Wave Science with Machine Learning. Mach. Learn. Sci. Tech. 2021. doi:10.1088/2632-2153/abb93a. arXiv:2005.03745

work page doi:10.1088/2632-2153/abb93a 2021

[13] [13]

RAS Techniques and Instruments , volume =

Slijepcevic, Inigo V and Scaife, Anna M M and Walmsley, Mike and Bowles, Micah and Wong, O Ivy and Shabala, Stanislav S and White, Sarah V , title =. RAS Techniques and Instruments , volume =. 2023 , month =. doi:10.1093/rasti/rzad055 , url =

work page doi:10.1093/rasti/rzad055 2023

[14] [14]

Monthly Notices of the Royal Astronomical Society , volume =

Parker, Liam and Lanusse, Francois and Golkar, Siavash and Sarra, Leopoldo and Cranmer, Miles and Bietti, Alberto and Eickenberg, Michael and Krawezik, Geraud and McCabe, Michael and Morel, Rudy and Ohana, Ruben and Pettee, Mariel and Régaldo-Saint Blancard, Bruno and Cho, Kyunghyun and Ho, Shirley and The Polymathic AI Collaboration , title =. Monthly No...

work page doi:10.1093/mnras/stae1450 2024

[15] [15]

and Mishra-Sharma, Siddharth and Villar, V

Zhang, Gemma and Helfer, Thomas and Gagliano, Alexander T. and Mishra-Sharma, Siddharth and Villar, V. Ashley. Maven: a multimodal foundation model for supernova science. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad990d. arXiv:2408.16829

work page doi:10.1088/2632-2153/ad990d 2024

[16] [16]

and Roussi, Marwah and Miller, David W

Bogatskiy, Alexander and Anderson, Brandon and Offermann, Jan T. and Roussi, Marwah and Miller, David W. and Kondor, Risi. Lorentz Group Equivariant Neural Network for Particle Physics. 2020. arXiv:2006.04780

arXiv 2020

[17] [17]

uller, David I. and Schuh, Daniel , title =

Favoni, Matteo and Ipp, Andreas and M\"uller, David I. and Schuh, Daniel , title = ". Phys. Rev. Lett. 2022. doi:10.1103/PhysRevLett.128.032003. arXiv:2012.12901

work page doi:10.1103/physrevlett.128.032003 2022

[18] [18]

uller, David I. and Schuh, Daniel , title =

Bulusu, Srinath and Favoni, Matteo and Ipp, Andreas and M\"uller, David I. and Schuh, Daniel , title = ". EPJ Web Conf. 2022. doi:10.1051/epjconf/202225809001. arXiv:2112.12493

work page doi:10.1051/epjconf/202225809001 2022

[19] [19]

An efficient Lorentz equivariant graph neural network for jet tagging

Gong, Shiqi and Meng, Qi and Zhang, Jue and Qu, Huilin and Li, Congqiao and Qian, Sitian and Du, Weitao and Ma, Zhi-Ming and Liu, Tie-Yan. An efficient Lorentz equivariant graph neural network for jet tagging. JHEP. 2022. doi:10.1007/JHEP07(2022)030. arXiv:2201.08187

work page doi:10.1007/jhep07(2022)030 2022

[20] [20]

uller, David I. , title =

Favoni, Matteo and Ipp, Andreas and M\"uller, David I. , title = ". EPJ Web Conf. 2022. doi:10.1051/epjconf/202227409001. arXiv:2212.00832

work page doi:10.1051/epjconf/202227409001 2022

[21] [21]

Symmetry Group Equivariant Architectures for Physics

Bogatskiy, Alexander and others. Symmetry Group Equivariant Architectures for Physics. Snowmass 2021. 2022. arXiv:2203.06153

arXiv 2021

[22] [22]

and Offermann, Jan T

Bogatskiy, Alexander and Hoffman, Timothy and Miller, David W. and Offermann, Jan T. PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics. 2022. arXiv:2211.00454

arXiv 2022

[23] [23]

Equivariant Graph Neural Networks for Charged Particle Tracking

Murnane, Daniel and Thais, Savannah and Thete, Ameya. Equivariant Graph Neural Networks for Charged Particle Tracking. 21th International Workshop on Advanced Computing and Analysis Techniques in Physics Research : AI meets Reality. 2023. arXiv:2304.05293

arXiv 2023

[24] [24]

Learning broken symmetries with approximate invariance

Nabat, Seth and Ghosh, Aishik and Witkowski, Edmund and Kasieczka, Gregor and Whiteson, Daniel. Learning broken symmetries with approximate invariance. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.072002. arXiv:2412.18773

work page doi:10.1103/physrevd.111.072002 2025

[25] [25]

and Offermann, Jan T

Bogatskiy, Alexander and Hoffman, Timothy and Miller, David W. and Offermann, Jan T. and Liu, Xiaoyang. Explainable equivariant neural networks for particle physics: PELICAN. JHEP. 2024. doi:10.1007/JHEP03(2024)113. arXiv:2307.16506

work page doi:10.1007/jhep03(2024)113 2024

[26] [26]

and Spannowsky, Michael

Ma\^ tre, Daniel and Ngairangbam, Vishal S. and Spannowsky, Michael. Optimal equivariant architectures from the symmetries of matrix-element likelihoods. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/adbab1. arXiv:2410.18553

work page doi:10.1088/2632-2153/adbab1 2025

[27] [27]

and Hallin, Anna and Kasieczka, Gregor and Kr

Amram, Oz and Anzalone, Luca and Birk, Joschka and Faroughy, Darius A. and Hallin, Anna and Kasieczka, Gregor and Kr. Aspen Open Jets: unlocking LHC data for foundation models in particle physics. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/ade58f. arXiv:2412.10504

work page doi:10.1088/2632-2153/ade58f 2025

[28] [28]

2023 , eprint=

LIMA: Less Is More for Alignment , author=. 2023 , eprint=

2023

[29] [29]

RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=. 2024 , eprint=

2024

[30] [30]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

2021

[31] [31]

2024 , eprint=

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs , author=. 2024 , eprint=

2024

[32] [32]

Terry , journal =

Ralph Allan Bradley and Milton E. Terry , journal =. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , urldate =

[33] [33]

2024 , eprint=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. 2024 , eprint=

2024

[34] [34]

2024 , eprint=

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training , author=. 2024 , eprint=

2024

[35] [35]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020

[36] [36]

2022 , eprint=

Emergent Abilities of Large Language Models , author=. 2022 , eprint=

2022

[37] [37]

Permutationless many-jet event reconstruction with symmetry preserving attention networks

Fenton, Michael James and Shmakov, Alexander and Ho, Ta-Wei and Hsu, Shih-Chieh and Whiteson, Daniel and Baldi, Pierre. Permutationless many-jet event reconstruction with symmetry preserving attention networks. Phys. Rev. D. 2022. doi:10.1103/PhysRevD.105.112008. arXiv:2010.09206

work page doi:10.1103/physrevd.105.112008 2022

[38] [38]

ABCNet: An attention-based method for particle tagging

Mikuni, Vinicius and Canelli, Florencia. ABCNet: An attention-based method for particle tagging. Eur. Phys. J. Plus. 2020. doi:10.1140/epjp/s13360-020-00497-3. arXiv:2001.05311

work page doi:10.1140/epjp/s13360-020-00497-3 2020

[39] [39]

Particle Transformer for Jet Tagging

Qu, Huilin and Li, Congqiao and Qian, Sitian. Particle Transformer for Jet Tagging. 2022. arXiv:2202.03772

arXiv 2022

[40] [40]

Automated Approach to Accurate, Precise, and Fast Detector Simulation and Reconstruction

Dreyer, Etienne and Gross, Eilam and Kobylianskii, Dmitrii and Mikuni, Vinicius and Nachman, Benjamin and Soybelman, Nathalie. Automated Approach to Accurate, Precise, and Fast Detector Simulation and Reconstruction. Phys. Rev. Lett. 2024. doi:10.1103/PhysRevLett.133.211902. arXiv:2406.01620

work page doi:10.1103/physrevlett.133.211902 2024

[41] [41]

Generating variable length full events from partons

Qu\'etant, Guillaume and Raine, John Andrew and Leigh, Matthew and Sengupta, Debajyoti and Golling, Tobias. Generating variable length full events from partons. Phys. Rev. D. 2024. doi:10.1103/PhysRevD.110.076023. arXiv:2406.13074

work page doi:10.1103/physrevd.110.076023 2024

[42] [42]

Generating particle physics Lagrangians with transformers

Koay, Yong Sheng and Enberg, Rikard and Moretti, Stefano and Camargo-Molina, Eliel. Generating particle physics Lagrangians with transformers. 2025. arXiv:2501.09729

arXiv 2025

[43] [43]

and Zhang, Xiaoyuan

Dersy, Aur\'elien and Schwartz, Matthew D. and Zhang, Xiaoyuan. Simplifying Polylogarithms with Machine Learning. Int. J. Data Sci. Math. Sci. 2024. doi:10.1142/S2810939223500028. arXiv:2206.04115

work page doi:10.1142/s2810939223500028 2024

[44] [44]

Learning the simplicity of scattering amplitudes

Cheung, Clifford and Dersy, Aur\'elien and Schwartz, Matthew D. Learning the simplicity of scattering amplitudes. SciPost Phys. 2025. doi:10.21468/SciPostPhys.18.2.040. arXiv:2408.04720

work page doi:10.21468/scipostphys.18.2.040 2025

[45] [45]

OmniJet- _ C : Learning point cloud calorimeter simulations using generative transformers

Birk, Joschka and Gaede, Frank and Hallin, Anna and Kasieczka, Gregor and Mozzanica, Martina and Rose, Henning. OmniJet- _ C : Learning point cloud calorimeter simulations using generative transformers. 2025. arXiv:2501.05534

arXiv 2025

[46] [46]

HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Bardhan, Jai and Agrawal, Radhikesh and Tilak, Abhiram and Neeraj, Cyrin and Mitra, Subhadip. HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture. 2025. arXiv:2502.03933

arXiv 2025

[47] [47]

and Rodgers, Jack P

Wildridge, Andrew J. and Rodgers, Jack P. and Colbert, Ethan M. and yao, Yao and Jung, Andreas W. and Liu, Miaoyuan. Bumblebee: Foundation Model for Particle Physics Discovery. 38th conference on Neural Information Processing Systems. 2024. arXiv:2412.07867

arXiv 2024

[48] [48]

Solving key challenges in collider physics with foundation models

Mikuni, Vinicius and Nachman, Benjamin. Solving key challenges in collider physics with foundation models. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.L051504. arXiv:2404.16091

work page doi:10.1103/physrevd.111.l051504 2025

[49] [49]

Resimulation-based self-supervised learning for pretraining physics foundation models

Harris, Philip and Krupa, Jeffrey and Kagan, Michael and Maier, Benedikt and Woodward, Nathaniel. Resimulation-based self-supervised learning for pretraining physics foundation models. Phys. Rev. D. 2025. doi:10.1103/PhysRevD.111.032010. arXiv:2403.07066

work page doi:10.1103/physrevd.111.032010 2025

[50] [50]

OmniJet- : the first cross-task foundation model for particle physics

Birk, Joschka and Hallin, Anna and Kasieczka, Gregor. OmniJet- : the first cross-task foundation model for particle physics. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad66ad. arXiv:2403.05618

work page doi:10.1088/2632-2153/ad66ad 2024

[51] [51]

Physics event classification using Large Language Models

Fanelli, Cristiano and Giroux, James and Moran, Patrick and Nayak, Hemalata and Suresh, Karthik and Walter, Eric. Physics event classification using Large Language Models. JINST. 2024. doi:10.1088/1748-0221/19/07/C07011. arXiv:2404.05752

work page doi:10.1088/1748-0221/19/07/c07011 2024

[52] [52]

Masked particle modeling on sets: towards self-supervised high energy physics foundation models

Golling, Tobias and Heinrich, Lukas and Kagan, Michael and Klein, Samuel and Leigh, Matthew and Osadchy, Margarita and Raine, John Andrew. Masked particle modeling on sets: towards self-supervised high energy physics foundation models. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad64a8. arXiv:2401.13537

work page doi:10.1088/2632-2153/ad64a8 2024

[53] [53]

Is Tokenization Needed for Masked Particle Modelling?

Leigh, Matthew and Klein, Samuel and Charton, Fran c ois and Golling, Tobias and Heinrich, Lukas and Kagan, Michael and Ochoa, In\^es and Osadchy, Margarita. Is Tokenization Needed for Masked Particle Modelling?. Mach. Learn. Sci. Tech. 2025. doi:10.1088/2632-2153/addb98. arXiv:2409.12589

work page doi:10.1088/2632-2153/addb98 2025

[54] [54]

Finetuning foundation models for joint analysis optimization in High Energy Physics

Vigl, Matthias and Hartman, Nicole and Heinrich, Lukas. Finetuning foundation models for joint analysis optimization in High Energy Physics. Mach. Learn. Sci. Tech. 2024. doi:10.1088/2632-2153/ad55a3. arXiv:2401.13536

work page doi:10.1088/2632-2153/ad55a3 2024

[55] [55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Assran, Mahmoud and Duval, Quentin and Misra, Ishan and Bojanowski, Piotr and Vincent, Pascal and Rabbat, Michael and LeCun, Yann and Ballas, Nicolas , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2023 , pages =

2023

[56] [56]

International Conference on Learning Representations , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

[57] [57]

2024 , eprint=

Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT , author=. 2024 , eprint=

2024

[58] [58]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

2023

[59] [59]

2020 , eprint=

Language Models are Few-Shot Learners , author=. 2020 , eprint=

2020

[60] [60]

2023 , eprint=

Large Language Models are Zero-Shot Reasoners , author=. 2023 , eprint=

2023

[61] [61]

The Impact of AI in Physics Education: A Comprehensive Review from GCSE to University Levels The Impact of AI in Physics Education , doi =

Yeadon, Will and Hardy, Tom , year =. The Impact of AI in Physics Education: A Comprehensive Review from GCSE to University Levels The Impact of AI in Physics Education , doi =

[62] [62]

2023 , eprint=

Physics simulation capabilities of LLMs , author=. 2023 , eprint=

2023

[63] [63]

2024 , eprint=

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics , author=. 2024 , eprint=

2024

[64] [64]

doi:10.48550/arXiv.2309.04533 , archivePrefix =

Unveiling Dark Matter free-streaming at the smallest scales with high redshift Lyman-alpha forest. doi:10.48550/arXiv.2309.04533 , archivePrefix =. 2309.04533 , primaryClass =

work page doi:10.48550/arxiv.2309.04533

[65] [65]

, keywords =

New constraints on warm dark matter from the Lyman- forest power spectrum. , keywords =. doi:10.1103/PhysRevD.108.023502 , archivePrefix =. 2209.14220 , primaryClass =

work page doi:10.1103/physrevd.108.023502

[66] [66]

High Mass X-ray Binaries and the Cosmic 21-cm Signal: Impact of Host Galaxy Absorption

High-mass X-ray binaries and the cosmic 21-cm signal: impact of host galaxy absorption. , keywords =. doi:10.1093/mnras/stx943 , archivePrefix =. 1702.00409 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stx943

[67] [67]

, keywords =

Cosmology with One Galaxy?. , keywords =. doi:10.3847/1538-4357/ac5d3f , archivePrefix =. 2201.02202 , primaryClass =

work page doi:10.3847/1538-4357/ac5d3f

[68] [68]

Radio Galaxy Zoo: Compact and extended radio source classification with deep learning

Radio Galaxy Zoo: compact and extended radio source classification with deep learning. , keywords =. doi:10.1093/mnras/sty163 , archivePrefix =. 1801.04861 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/sty163

[69] [69]

ML4Astro International Conference , pages=

Deep Learning 21 cm Lightcones in 3D , author=. ML4Astro International Conference , pages=. 2022 , organization=. doi:10.48550/arXiv.2311.17553 , archivePrefix =. 2311.17553 , primaryClass =

work page doi:10.48550/arxiv.2311.17553 2022

[70] [70]

, keywords =

Quantifying uncertainty in deep learning approaches to radio galaxy classification. , keywords =. doi:10.1093/mnras/stac223 , archivePrefix =. 2201.01203 , primaryClass =

work page doi:10.1093/mnras/stac223

[71] [71]

Galaxy Spectra neural Network (GaSNet). II. Using Deep Learning for Spectral Classification and Redshift Predictions. doi:10.48550/arXiv.2311.04146 , archivePrefix =. 2311.04146 , primaryClass =

work page doi:10.48550/arxiv.2311.04146

[72] [72]

doi:10.48550/arXiv.2310.02684 , archivePrefix =

The LoReLi database: 21 cm signal inference with 3D radiative hydrodynamics simulations. doi:10.48550/arXiv.2310.02684 , archivePrefix =. 2310.02684 , primaryClass =

work page doi:10.48550/arxiv.2310.02684

[73] [73]

Reports on Progress in Physics , keywords =

Machine learning for observational cosmology. Reports on Progress in Physics , keywords =. doi:10.1088/1361-6633/acd2ea , archivePrefix =. 2303.15794 , primaryClass =

work page doi:10.1088/1361-6633/acd2ea

[74] [74]

doi:10.48550/arXiv.2203.08056 , archivePrefix =

Machine Learning and Cosmology. doi:10.48550/arXiv.2203.08056 , archivePrefix =. 2203.08056 , primaryClass =

work page doi:10.48550/arxiv.2203.08056

[75] [75]

Simulating the 21-cm signal from reionisation including non-linear ionisations and inhomogeneous recombinations

Simulating the 21 cm signal from reionization including non-linear ionizations and inhomogeneous recombinations. , keywords =. doi:10.1093/mnras/stv3001 , archivePrefix =. 1510.04280 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stv3001

[76] [76]

, keywords =

Astraeus I: the interplay between galaxy formation and reionization. , keywords =. doi:10.1093/mnras/stab602 , archivePrefix =. 2004.08401 , primaryClass =

work page doi:10.1093/mnras/stab602 2004

[77] [77]

doi:10.48550/arXiv.2310.17602 , archivePrefix =

Simulation-based Inference of Reionization Parameters from 3D Tomographic 21 cm Light-cone Images -- II: Application of Solid Harmonic Wavelet Scattering Transform. doi:10.48550/arXiv.2310.17602 , archivePrefix =. 2310.17602 , primaryClass =

work page doi:10.48550/arxiv.2310.17602

[78] [78]

The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys

Huertas-Company, Marc and Lanusse, Fran c ois. The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys. Publ. Astron. Soc. Austral. 2023. doi:10.1017/pasa.2022.55. arXiv:2210.01813

work page doi:10.1017/pasa.2022.55 2023

[79] [79]

2020 , eprint=

BayesFlow: Learning complex stochastic models with invertible neural networks , author=. 2020 , eprint=

2020

[80] [80]

2023 , eprint=

BayesFlow: Amortized Bayesian Workflows With Neural Networks , author=. 2023 , eprint=

2023