Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

Ding Yu Shao; Hao-Zhe Shi; Wanchen Li; Yu-Xuan Sun

arxiv: 2605.18360 · v2 · pith:5P4WRR26new · submitted 2026-05-18 · ✦ hep-ph

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

Wanchen Li , Ding Yu Shao , Hao-Zhe Shi , Yu-Xuan Sun This is my paper

Pith reviewed 2026-05-21 08:02 UTC · model grok-4.3

classification ✦ hep-ph

keywords parton showersnon-global logarithmsautoregressive transformervariable multiplicityresummationmachine learningdipole showerhigh-energy physics

0 comments

The pith

Nested-GPT generates variable-multiplicity parton showers by sequentially predicting emissions and learning a termination condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Nested-GPT, a hierarchical autoregressive Transformer for simulating parton-shower histories whose multiplicity is not fixed in advance. The architecture predicts each emission in sequence while enforcing the ordered Markovian branching structure of a dipole shower and uses a learned condition to decide when the sequence ends. The authors test this approach on the leading-logarithmic resummation of non-global logarithms in the large-Nc limit, training on data from a stochastic Monte Carlo dipole shower and comparing gap-fraction observables under two training regimes. Generated samples from Nested-GPT agree with the reference shower within statistical uncertainties, establishing it as a consistent autoregressive surrogate for traditional shower generators.

Core claim

Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition; the resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered.

What carries the argument

Nested-GPT, the hierarchical autoregressive Transformer that models sequential emission prediction together with a learned termination condition to produce variable-length shower histories.

If this is right

Nested-GPT supplies a physically consistent surrogate for variable-multiplicity parton-shower generators.
The same architecture supports both direct training on vetoed histories and inclusive training followed by an analysis-level veto.
The method provides a foundation for extending the resummation treatment to subleading logarithms.
The results motivate further development toward finite-Nc color evolution inside the same autoregressive framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Enforcing the physical branching order inside the model may reduce the need for post-generation corrections in other sequential Monte Carlo simulations.
The dynamic termination mechanism could transfer to other variable-length processes such as hadronization or decay chains.
Combining the architecture with higher-order matrix elements might improve the accuracy of full event generation without manual multiplicity specification.

Load-bearing premise

The stochastic Monte Carlo dipole shower used to generate the training data correctly captures the leading-logarithmic resummation of non-global logarithms in the large-Nc limit.

What would settle it

A statistically significant mismatch between the gap-fraction distributions produced by Nested-GPT samples and those from the reference dipole shower, beyond the reported statistical uncertainties, would falsify the agreement claim.

Figures

Figures reproduced from arXiv: 2605.18360 by Ding Yu Shao, Hao-Zhe Shi, Wanchen Li, Yu-Xuan Sun.

**Figure 2.** Figure 2: FIG. 2. Comparison of the gap fraction [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Training history of the Nested-GPT model on the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Comparison of the inclusive shower samples gen [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Events from both models are terminated after genera [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

We introduce Nested-GPT, a hierarchical autoregressive Transformer architecture for simulating the variable-multiplicity parton-shower histories. As a controlled benchmark, we study the leading-logarithmic resummation of non-global logarithms in the large-$N_c$ limit, utilizing a stochastic Monte Carlo dipole shower to generate reference training data. We systematically evaluate Nested-GPT against a Transformer flow-matching baseline. The flow-matching framework successfully parameterizes the joint distribution of emission kinematics at fixed multiplicity. Its phase-space representation, however, requires the final number of emissions to be specified externally rather than generated dynamically. Conversely, Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition. We benchmark both approaches using gap fraction observables under two complementary training regimes: direct training on vetoed histories and inclusive training followed by an analysis-level veto. The resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered. These results establish Nested-GPT as a physically consistent autoregressive surrogate for variable-multiplicity shower generator and motivate extensions to subleading-logarithmic resummation and finite-$N_c$ color evolution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nested-GPT gives a clean autoregressive handle on variable-multiplicity showers by learning the stop condition, and it matches the reference on gap fractions, though multiplicity fidelity outside training needs checking.

read the letter

The main point is that Nested-GPT uses a hierarchical autoregressive Transformer to build parton-shower histories emission by emission while learning when to terminate the sequence. In their benchmark of leading-log non-global logarithms in the large-Nc limit, the generated samples line up with the stochastic dipole-shower reference on gap-fraction observables under both direct-veto and inclusive-plus-veto training setups. That is the concrete result the paper delivers.

Referee Report

1 major / 2 minor

Summary. The paper introduces Nested-GPT, a hierarchical autoregressive Transformer for generating variable-multiplicity parton-shower histories. Using a stochastic Monte Carlo dipole shower as reference in the large-N_c limit, it benchmarks the model on leading-logarithmic resummation of non-global logarithms. Nested-GPT is compared to a flow-matching Transformer baseline; the former enforces ordered Markovian branching with a learned termination condition while the latter requires external multiplicity specification. Generated samples are reported to agree with the reference within statistical uncertainties for gap-fraction observables under direct-veto and inclusive-plus-analysis-veto training regimes.

Significance. If the central results hold, the work provides a concrete demonstration that autoregressive Transformers can serve as physically consistent surrogates for variable-multiplicity showers by dynamically learning both emission kinematics and sequence termination. The explicit comparison to flow-matching and the use of two complementary training regimes are strengths that clarify the advantages of the Markovian structure for non-global logarithm resummation.

major comments (1)

[Abstract] Abstract and benchmark setup: the central claim that Nested-GPT reproduces the reference shower via a learned sequence-termination condition is supported only by agreement on gap-fraction observables. No explicit validation of the multiplicity distribution or of the termination probability as a function of gap configuration is described, leaving open whether the joint distribution over emission number and kinematics remains faithful at multiplicities or phase-space points sparsely sampled in training.

minor comments (2)

Notation for the hierarchical autoregressive structure and the precise definition of the learned termination condition should be clarified with an explicit equation or pseudocode block.
Figure captions for the gap-fraction plots should state the statistical uncertainty bands and the number of generated events used for each curve.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive feedback. We address the major comment below and indicate the revisions made to strengthen the validation of the model.

read point-by-point responses

Referee: [Abstract] Abstract and benchmark setup: the central claim that Nested-GPT reproduces the reference shower via a learned sequence-termination condition is supported only by agreement on gap-fraction observables. No explicit validation of the multiplicity distribution or of the termination probability as a function of gap configuration is described, leaving open whether the joint distribution over emission number and kinematics remains faithful at multiplicities or phase-space points sparsely sampled in training.

Authors: We thank the referee for this observation. The gap-fraction observables directly probe the leading-logarithmic resummation of non-global logarithms and are sensitive to the interplay between emission kinematics and the number of branchings. Nevertheless, we agree that explicit checks on the multiplicity distribution and the learned termination probability provide valuable additional evidence. In the revised manuscript we have added new figures comparing the generated multiplicity distributions to the reference Monte Carlo shower for both the direct-veto and inclusive-plus-analysis-veto training regimes; these distributions agree within statistical uncertainties over the full range of multiplicities populated by the reference. We have also included a plot of the termination probability conditioned on gap configuration, which reproduces the expected dependence arising from the ordered Markovian branching structure. These additions confirm that the joint distribution over emission number and kinematics is reproduced faithfully, including in regions with lower training statistics. revision: yes

Circularity Check

0 steps flagged

No circularity: validation against external Monte Carlo reference shower

full rationale

The paper generates training data from an independent stochastic Monte Carlo dipole shower and shows that Nested-GPT samples agree with this reference within statistical uncertainties on gap-fraction observables. The architecture enforces ordered Markovian branching by construction and learns a termination condition from the external data; neither step reduces to a self-referential fit or self-citation. A separate flow-matching baseline is used for comparison, providing an independent check. No load-bearing claim relies on prior author work or renames a known result as a new derivation. The central result is empirical reproduction of an external generator, which is self-contained and falsifiable against the reference.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The approach rests on the accuracy of the reference Monte Carlo dipole shower and the validity of the large-Nc leading-log approximation; no new physical entities are postulated.

free parameters (1)

Transformer hyperparameters and training schedule
Model depth, width, learning rate, and termination threshold are chosen or optimized during training on the reference data.

axioms (2)

domain assumption Parton-shower evolution can be represented as an ordered Markovian branching process
Invoked in the design of the autoregressive prediction and sequence-termination condition.
domain assumption The large-Nc limit plus leading-log resummation is adequately captured by the stochastic dipole shower used for training data
Basis for the benchmark observables and reference samples.

invented entities (1)

Nested-GPT hierarchical autoregressive Transformer no independent evidence
purpose: To generate variable-multiplicity parton-shower histories while enforcing ordered branching
New model architecture introduced for this task.

pith-pipeline@v0.9.0 · 5751 in / 1470 out tokens · 43426 ms · 2026-05-21T08:02:51.981891+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We benchmark both approaches using gap fraction observables under two complementary training regimes

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 33 internal anchors

[1]

Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]

A. Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]

work page arXiv 2023
[2]

Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]

M. Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]

work page arXiv 2026
[3]

T. Cai, K. Li, and T. Li, (2026), arXiv:2605.03474 [hep- ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, inAdvances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Inc., 2014) pp. 2672–2680, arXiv:1406.2661 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[5]

CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

M. Paganini, L. de Oliveira, and B. Nachman, Phys. Rev. D97, 014021 (2018), arXiv:1712.10321 [hep-ex]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

L. de Oliveira, M. Paganini, and B. Nachman, Comput. Softw. Big Sci.1, 4 (2017), arXiv:1701.05927 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Butter, T

A. Butter, T. Plehn, and R. Winterhalder, SciPost Phys. 7, 075 (2019), arXiv:1907.03764 [hep-ph]

work page arXiv 2019
[8]

Butter, T

A. Butter, T. Plehn, and R. Winterhalder, SciPost Phys. Core3, 009 (2020), arXiv:1912.08824 [hep-ph]

work page arXiv 2020
[9]

Butter, S

A. Butter, S. Diefenbacher, G. Kasieczka, B. Nach- man, and T. Plehn, SciPost Phys.10, 139 (2021), arXiv:2008.06545 [hep-ph]

work page arXiv 2021
[10]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mo- hamed, and B. Lakshminarayanan, J. Mach. Learn. Res. 22, 1 (2021), arXiv:1912.02762 [stat.ML]

work page arXiv 2021
[11]

Butter, T

A. Butter, T. Heimel, S. Hummerich, T. Krebs, T. Plehn, A. Rousselot, and S. Vent, SciPost Phys.14, 078 (2023), arXiv:2110.13632 [hep-ph]

work page arXiv 2023
[12]

Heimel, R

T. Heimel, R. Winterhalder, A. Butter, J. Isaacson, C. Krause, F. Maltoni, O. Mattelaer, and T. Plehn, Sci- Post Phys.15, 141 (2023), arXiv:2212.06172 [hep-ph]

work page arXiv 2023
[13]

C. Gao, S. H¨ oche, J. Isaacson, C. Krause, and H. Schulz, Phys. Rev. D101, 076002 (2020), arXiv:2001.10028 [hep- ph]

work page arXiv 2020
[14]

Bothmann, T

E. Bothmann, T. Janßen, M. Knobbe, T. Schmale, and S. Schumann, SciPost Phys.8, 069 (2020), arXiv:2001.05478 [hep-ph]

work page arXiv 2020
[15]

J. Ho, A. Jain, and P. Abbeel, inAdvances in Neu- ral Information Processing Systems 33 (NeurIPS)(2020) arXiv:2006.11239 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[16]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Ku- mar, S. Ermon, and B. Poole, inInternational Con- ference on Learning Representations (ICLR)(2021) arXiv:2011.13456 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

Flow Matching for Generative Modeling

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, inInternational Conference on Learning Rep- resentations (ICLR)(2023) arXiv:2210.02747 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

M. S. Albergo and E. Vanden-Eijnden, inInternational Conference on Learning Representations (ICLR)(2023) arXiv:2209.15571 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Mikuni and B

V. Mikuni and B. Nachman, Phys. Rev. D106, 092009 (2022), arXiv:2206.11898 [hep-ph]

work page arXiv 2022
[20]

EPiC-ly fast particle cloud generation with flow-matching and diffusion,

E. Buhmann, C. Ewen, D. A. Faroughy, T. Golling, G. Kasieczka, M. Leigh, G. Qu´ etant, J. A. Raine, D. Sengupta, and D. Shih, “EPiC-ly fast particle cloud generation with flow-matching and diffusion,” (2023), arXiv:2310.00049 [hep-ph]

work page arXiv 2023
[21]

Butter, N

A. Butter, N. Huetsch, S. Palacios Schweitzer, T. Plehn, P. Sorrenson, and J. Spinner, SciPost Phys. Core8, 026 (2025), arXiv:2305.10475 [hep-ph]

work page arXiv 2025
[22]

Modern Machine Learning for LHC Physicists,

T. Plehn, A. Butter, B. Dillon, T. Heimel, C. Krause, and R. Winterhalder, “Modern Machine Learning for LHC Physicists,” (2022), arXiv:2211.01421 [hep-ph]

work page arXiv 2022
[23]

A Living Review of Ma- chine Learning for Particle Physics,

HEP ML Community, “A Living Review of Ma- chine Learning for Particle Physics,”https://iml-wg. github.io/HEPML-LivingReview/(2022)

work page 2022
[24]

Gross, E

F. Gross, E. Klempt, S. J. Brodsky,et al., Eur. Phys. J. C83, 1125 (2023), arXiv:2212.11107 [hep-ph]

work page arXiv 2023
[25]

A. D. Martin, W. J. Stirling, R. S. Thorne, and G. Watt, Eur. Phys. J. C63, 189 (2009), arXiv:0901.0002 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2009
[26]

Kogleret al., Rev

R. Kogleret al., Rev. Mod. Phys.91, 045003 (2019), arXiv:1803.06991 [hep-ex]

work page arXiv 2019
[27]

General-purpose event generators for LHC physics

A. Buckleyet al., Phys. Rept.504, 145 (2011), arXiv:1101.2599 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2011
[28]

Reweighting a parton shower using a neural network: the final-state case

E. Bothmann and L. Del Debbio, JHEP01, 033 (2019), arXiv:1808.07802 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[29]

J. W. Monk, JHEP12, 021 (2018), arXiv:1807.03685 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Butter, F

A. Butter, F. Charton, J. M. Villadamigo, A. Ore, T. Plehn, and J. Spinner, SciPost Phys.20, 004 (2026), arXiv:2412.12074 [hep-ph]

work page arXiv 2026
[31]

Danziger, T

K. Danziger, T. Janßen, S. Schumann, and F. Siegert, SciPost Phys.12, 164 (2022), arXiv:2109.11964 [hep-ph]

work page arXiv 2022
[32]

MadNIS at NLO

G. De Crescenzo, J. M. Villadamigo, N. Elmer, T. Heimel, T. Plehn, R. Winterhalder, and M. Zaro, “MadNIS at NLO,” (2026), arXiv:2603.22407 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[33]

MadSpace – Event Generation for the Era of GPUs and ML,

T. Heimel, O. Mattelaer, and R. Winterhalder, “MadSpace – Event Generation for the Era of GPUs and ML,” (2026), arXiv:2602.06895 [hep-ph]

work page arXiv 2026
[34]

FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,

J. M. Villadamigo, R. Frederix, T. Plehn, T. Vitos, and R. Winterhalder, “FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,” (2025), arXiv:2509.07068 [hep-ph]

work page arXiv 2025
[35]

A comprehensive guide to the physics and usage of PYTHIA 8.3

C. Bierlichet al., SciPost Phys. Codebases , 8 (2022), arXiv:2203.11601 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]

E. Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]

work page arXiv 2024
[37]

Herwig 7.0 / Herwig++ 3.0 Release Note

J. Bellmet al., Eur. Phys. J. C76, 196 (2016), arXiv:1512.01178 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Nagy and D

Z. Nagy and D. E. Soper, Phys. Rev. D104, 054049 (2021), arXiv:2011.04773 [hep-ph]

work page arXiv 2021
[39]

J. R. Forshaw, J. Holguin, and S. Pl¨ atzer, JHEP09, 014 (2020), arXiv:2003.06400 [hep-ph]

work page arXiv 2020
[40]

van Beekveldet al., Phys

M. van Beekveldet al., Phys. Rev. Lett.134, 011901 12 (2025), arXiv:2406.02661 [hep-ph]

work page arXiv 2025
[41]

Ferrario Ravasio, in59th Rencontres de Moriond on QCD and High Energy Interactions: Moriond QCD 2025 (2025) arXiv:2505.13395 [hep-ph]

S. Ferrario Ravasio, in59th Rencontres de Moriond on QCD and High Energy Interactions: Moriond QCD 2025 (2025) arXiv:2505.13395 [hep-ph]

work page arXiv 2025
[42]

Resummation of non-global QCD observables

M. Dasgupta and G. P. Salam, Phys. Lett.B512, 323 (2001), arXiv:hep-ph/0104277 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2001
[43]

Away-from-jet energy flow

A. Banfi, G. Marchesini, and G. Smye, JHEP08, 006 (2002), arXiv:hep-ph/0206076 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2002
[44]

Balitsky, Phys

I. Balitsky, Phys. Rev. D60, 014020 (1999), arXiv:hep- ph/9812311

work page arXiv 1999
[45]

Y. V. Kovchegov, Phys. Rev. D61, 074018 (2000), arXiv:hep-ph/9905214

work page internal anchor Pith review Pith/arXiv arXiv 2000
[46]

Y. V. Kovchegov, Phys. Rev. D60, 034008 (1999), arXiv:hep-ph/9901281

work page internal anchor Pith review Pith/arXiv arXiv 1999
[47]

Resummation of non-global logarithms and the BFKL equation

S. Caron-Huot, JHEP03, 036 (2018), arXiv:1501.03754 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[48]

Brunello, S

G. Brunello, S. Caron-Huot, G. Crisanti, M. Giroux, and S. Smith, JHEP11, 055 (2025), arXiv:2508.03794 [hep- ph]

work page arXiv 2025
[49]

J. R. Forshaw, A. Kyrieleis, and M. H. Seymour, JHEP 08, 059 (2006), arXiv:hep-ph/0604094

work page internal anchor Pith review Pith/arXiv arXiv 2006
[50]

Resummation of non-global logarithms at finite $N_c$

Y. Hatta and T. Ueda, Nucl. Phys. B874, 808 (2013), arXiv:1304.6930 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2013
[51]

De Angelis, J

M. De Angelis, J. R. Forshaw, and S. Pl¨ atzer, Phys. Rev. Lett.126, 112001 (2021), arXiv:2007.09648 [hep-ph]

work page arXiv 2021
[52]

Banfi, F

A. Banfi, F. A. Dreyer, and P. F. Monni, JHEP10, 006 (2021), arXiv:2104.06416 [hep-ph]

work page arXiv 2021
[53]

Banfi, F

A. Banfi, F. A. Dreyer, and P. F. Monni, JHEP03, 135 (2022), arXiv:2111.02413 [hep-ph]

work page arXiv 2022
[54]

Becher, T

T. Becher, T. Rauh, and X. Xu, JHEP08, 134 (2022), arXiv:2112.02108 [hep-ph]

work page arXiv 2022
[55]

Becher, N

T. Becher, N. Schalch, and X. Xu, Phys. Rev. Lett.132, 081602 (2024), arXiv:2307.02283 [hep-ph]

work page arXiv 2024
[56]

Ferrario Ravasio, K

S. Ferrario Ravasio, K. Hamilton, A. Karlberg, G. P. Salam, L. Scyboz, and G. Soyez, Phys. Rev. Lett.131, 161906 (2023), arXiv:2307.11142 [hep-ph]

work page arXiv 2023
[57]

Leigh, D

M. Leigh, D. Sengupta, G. Qu´ etant, J. A. Raine, K. Zoch, and T. Golling, SciPost Phys.16, 018 (2024), arXiv:2303.05376 [hep-ph]

work page arXiv 2024
[58]

Y. S. Lai, D. Neill, M. P losko´ n, and F. Ringer, Phys. Lett. B829, 137055 (2022), arXiv:2012.06582 [hep-ph]

work page arXiv 2022
[59]

Non-global logarithms in jet and isolation cone cross sections

M. Balsiger, T. Becher, and D. Y. Shao, JHEP08, 104 (2018), arXiv:1803.07045 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[60]

Balsiger, T

M. Balsiger, T. Becher, and A. Ferroglia, JHEP09, 029 (2020), arXiv:2006.00014 [hep-ph]

work page arXiv 2020
[61]

An Effective Field Theory for Jet Processes

T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, Phys. Rev. Lett.116, 192001 (2016), arXiv:1508.06645 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[62]

Factorization and Resummation for Jet Processes

T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, JHEP11, 019 (2016), [Erratum: JHEP 05, 154 (2017)], arXiv:1605.02737 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[63]

A. J. Larkoski, I. Moult, and D. Neill, JHEP09, 143 (2015), arXiv:1501.04596 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[64]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Du- venaud, inAdvances in Neural Information Processing Systems 31 (NeurIPS)(2018) arXiv:1806.07366 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[65]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30 (NeurIPS)(2017) arXiv:1706.03762 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, inInternational Con- ference on Learning Representations (ICLR)(2019) arXiv:1711.05101 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[67]

Improving language understanding by gen- erative pre-training,

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by gen- erative pre-training,” (2018), openAI preprint

work page 2018
[68]

Language models are unsupervised multi- task learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multi- task learners,” (2019), openAI preprint

work page 2019
[69]

Finke, M

T. Finke, M. Kr¨ amer, A. M¨ uck, and J. T¨ onshoff, JHEP 06, 184 (2023), arXiv:2303.07364 [hep-ph]

work page arXiv 2023
[70]

Gaussian Error Linear Units (GELUs)

D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” (2016), arXiv:1606.08415 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[71]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normal- ization,” (2016), arXiv:1607.06450 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[72]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, inProceedings of the 2014 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP)(2014) pp. 1724–1734, arXiv:1406.1078 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[73]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Em- pirical evaluation of gated recurrent neural networks on sequence modeling,” (2014), arXiv:1412.3555 [cs.NE]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[74]

Diaz and A

R. Diaz and A. Marathe, inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019) pp. 4738–4747

work page 2019

[1] [1]

Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]

A. Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]

work page arXiv 2023

[2] [2]

Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]

M. Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]

work page arXiv 2026

[3] [3]

T. Cai, K. Li, and T. Li, (2026), arXiv:2605.03474 [hep- ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[4] [4]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, inAdvances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Inc., 2014) pp. 2672–2680, arXiv:1406.2661 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2014

[5] [5]

CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

M. Paganini, L. de Oliveira, and B. Nachman, Phys. Rev. D97, 014021 (2018), arXiv:1712.10321 [hep-ex]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

L. de Oliveira, M. Paganini, and B. Nachman, Comput. Softw. Big Sci.1, 4 (2017), arXiv:1701.05927 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Butter, T

A. Butter, T. Plehn, and R. Winterhalder, SciPost Phys. 7, 075 (2019), arXiv:1907.03764 [hep-ph]

work page arXiv 2019

[8] [8]

Butter, T

A. Butter, T. Plehn, and R. Winterhalder, SciPost Phys. Core3, 009 (2020), arXiv:1912.08824 [hep-ph]

work page arXiv 2020

[9] [9]

Butter, S

A. Butter, S. Diefenbacher, G. Kasieczka, B. Nach- man, and T. Plehn, SciPost Phys.10, 139 (2021), arXiv:2008.06545 [hep-ph]

work page arXiv 2021

[10] [10]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mo- hamed, and B. Lakshminarayanan, J. Mach. Learn. Res. 22, 1 (2021), arXiv:1912.02762 [stat.ML]

work page arXiv 2021

[11] [11]

Butter, T

A. Butter, T. Heimel, S. Hummerich, T. Krebs, T. Plehn, A. Rousselot, and S. Vent, SciPost Phys.14, 078 (2023), arXiv:2110.13632 [hep-ph]

work page arXiv 2023

[12] [12]

Heimel, R

T. Heimel, R. Winterhalder, A. Butter, J. Isaacson, C. Krause, F. Maltoni, O. Mattelaer, and T. Plehn, Sci- Post Phys.15, 141 (2023), arXiv:2212.06172 [hep-ph]

work page arXiv 2023

[13] [13]

C. Gao, S. H¨ oche, J. Isaacson, C. Krause, and H. Schulz, Phys. Rev. D101, 076002 (2020), arXiv:2001.10028 [hep- ph]

work page arXiv 2020

[14] [14]

Bothmann, T

E. Bothmann, T. Janßen, M. Knobbe, T. Schmale, and S. Schumann, SciPost Phys.8, 069 (2020), arXiv:2001.05478 [hep-ph]

work page arXiv 2020

[15] [15]

J. Ho, A. Jain, and P. Abbeel, inAdvances in Neu- ral Information Processing Systems 33 (NeurIPS)(2020) arXiv:2006.11239 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[16] [16]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Ku- mar, S. Ermon, and B. Poole, inInternational Con- ference on Learning Representations (ICLR)(2021) arXiv:2011.13456 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

Flow Matching for Generative Modeling

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, inInternational Conference on Learning Rep- resentations (ICLR)(2023) arXiv:2210.02747 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

M. S. Albergo and E. Vanden-Eijnden, inInternational Conference on Learning Representations (ICLR)(2023) arXiv:2209.15571 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Mikuni and B

V. Mikuni and B. Nachman, Phys. Rev. D106, 092009 (2022), arXiv:2206.11898 [hep-ph]

work page arXiv 2022

[20] [20]

EPiC-ly fast particle cloud generation with flow-matching and diffusion,

E. Buhmann, C. Ewen, D. A. Faroughy, T. Golling, G. Kasieczka, M. Leigh, G. Qu´ etant, J. A. Raine, D. Sengupta, and D. Shih, “EPiC-ly fast particle cloud generation with flow-matching and diffusion,” (2023), arXiv:2310.00049 [hep-ph]

work page arXiv 2023

[21] [21]

Butter, N

A. Butter, N. Huetsch, S. Palacios Schweitzer, T. Plehn, P. Sorrenson, and J. Spinner, SciPost Phys. Core8, 026 (2025), arXiv:2305.10475 [hep-ph]

work page arXiv 2025

[22] [22]

Modern Machine Learning for LHC Physicists,

T. Plehn, A. Butter, B. Dillon, T. Heimel, C. Krause, and R. Winterhalder, “Modern Machine Learning for LHC Physicists,” (2022), arXiv:2211.01421 [hep-ph]

work page arXiv 2022

[23] [23]

A Living Review of Ma- chine Learning for Particle Physics,

HEP ML Community, “A Living Review of Ma- chine Learning for Particle Physics,”https://iml-wg. github.io/HEPML-LivingReview/(2022)

work page 2022

[24] [24]

Gross, E

F. Gross, E. Klempt, S. J. Brodsky,et al., Eur. Phys. J. C83, 1125 (2023), arXiv:2212.11107 [hep-ph]

work page arXiv 2023

[25] [25]

A. D. Martin, W. J. Stirling, R. S. Thorne, and G. Watt, Eur. Phys. J. C63, 189 (2009), arXiv:0901.0002 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2009

[26] [26]

Kogleret al., Rev

R. Kogleret al., Rev. Mod. Phys.91, 045003 (2019), arXiv:1803.06991 [hep-ex]

work page arXiv 2019

[27] [27]

General-purpose event generators for LHC physics

A. Buckleyet al., Phys. Rept.504, 145 (2011), arXiv:1101.2599 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2011

[28] [28]

Reweighting a parton shower using a neural network: the final-state case

E. Bothmann and L. Del Debbio, JHEP01, 033 (2019), arXiv:1808.07802 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[29] [29]

J. W. Monk, JHEP12, 021 (2018), arXiv:1807.03685 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Butter, F

A. Butter, F. Charton, J. M. Villadamigo, A. Ore, T. Plehn, and J. Spinner, SciPost Phys.20, 004 (2026), arXiv:2412.12074 [hep-ph]

work page arXiv 2026

[31] [31]

Danziger, T

K. Danziger, T. Janßen, S. Schumann, and F. Siegert, SciPost Phys.12, 164 (2022), arXiv:2109.11964 [hep-ph]

work page arXiv 2022

[32] [32]

MadNIS at NLO

G. De Crescenzo, J. M. Villadamigo, N. Elmer, T. Heimel, T. Plehn, R. Winterhalder, and M. Zaro, “MadNIS at NLO,” (2026), arXiv:2603.22407 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[33] [33]

MadSpace – Event Generation for the Era of GPUs and ML,

T. Heimel, O. Mattelaer, and R. Winterhalder, “MadSpace – Event Generation for the Era of GPUs and ML,” (2026), arXiv:2602.06895 [hep-ph]

work page arXiv 2026

[34] [34]

FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,

J. M. Villadamigo, R. Frederix, T. Plehn, T. Vitos, and R. Winterhalder, “FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,” (2025), arXiv:2509.07068 [hep-ph]

work page arXiv 2025

[35] [35]

A comprehensive guide to the physics and usage of PYTHIA 8.3

C. Bierlichet al., SciPost Phys. Codebases , 8 (2022), arXiv:2203.11601 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2022

[36] [36]

Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]

E. Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]

work page arXiv 2024

[37] [37]

Herwig 7.0 / Herwig++ 3.0 Release Note

J. Bellmet al., Eur. Phys. J. C76, 196 (2016), arXiv:1512.01178 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Nagy and D

Z. Nagy and D. E. Soper, Phys. Rev. D104, 054049 (2021), arXiv:2011.04773 [hep-ph]

work page arXiv 2021

[39] [39]

J. R. Forshaw, J. Holguin, and S. Pl¨ atzer, JHEP09, 014 (2020), arXiv:2003.06400 [hep-ph]

work page arXiv 2020

[40] [40]

van Beekveldet al., Phys

M. van Beekveldet al., Phys. Rev. Lett.134, 011901 12 (2025), arXiv:2406.02661 [hep-ph]

work page arXiv 2025

[41] [41]

Ferrario Ravasio, in59th Rencontres de Moriond on QCD and High Energy Interactions: Moriond QCD 2025 (2025) arXiv:2505.13395 [hep-ph]

S. Ferrario Ravasio, in59th Rencontres de Moriond on QCD and High Energy Interactions: Moriond QCD 2025 (2025) arXiv:2505.13395 [hep-ph]

work page arXiv 2025

[42] [42]

Resummation of non-global QCD observables

M. Dasgupta and G. P. Salam, Phys. Lett.B512, 323 (2001), arXiv:hep-ph/0104277 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2001

[43] [43]

Away-from-jet energy flow

A. Banfi, G. Marchesini, and G. Smye, JHEP08, 006 (2002), arXiv:hep-ph/0206076 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2002

[44] [44]

Balitsky, Phys

I. Balitsky, Phys. Rev. D60, 014020 (1999), arXiv:hep- ph/9812311

work page arXiv 1999

[45] [45]

Y. V. Kovchegov, Phys. Rev. D61, 074018 (2000), arXiv:hep-ph/9905214

work page internal anchor Pith review Pith/arXiv arXiv 2000

[46] [46]

Y. V. Kovchegov, Phys. Rev. D60, 034008 (1999), arXiv:hep-ph/9901281

work page internal anchor Pith review Pith/arXiv arXiv 1999

[47] [47]

Resummation of non-global logarithms and the BFKL equation

S. Caron-Huot, JHEP03, 036 (2018), arXiv:1501.03754 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[48] [48]

Brunello, S

G. Brunello, S. Caron-Huot, G. Crisanti, M. Giroux, and S. Smith, JHEP11, 055 (2025), arXiv:2508.03794 [hep- ph]

work page arXiv 2025

[49] [49]

J. R. Forshaw, A. Kyrieleis, and M. H. Seymour, JHEP 08, 059 (2006), arXiv:hep-ph/0604094

work page internal anchor Pith review Pith/arXiv arXiv 2006

[50] [50]

Resummation of non-global logarithms at finite $N_c$

Y. Hatta and T. Ueda, Nucl. Phys. B874, 808 (2013), arXiv:1304.6930 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2013

[51] [51]

De Angelis, J

M. De Angelis, J. R. Forshaw, and S. Pl¨ atzer, Phys. Rev. Lett.126, 112001 (2021), arXiv:2007.09648 [hep-ph]

work page arXiv 2021

[52] [52]

Banfi, F

A. Banfi, F. A. Dreyer, and P. F. Monni, JHEP10, 006 (2021), arXiv:2104.06416 [hep-ph]

work page arXiv 2021

[53] [53]

Banfi, F

A. Banfi, F. A. Dreyer, and P. F. Monni, JHEP03, 135 (2022), arXiv:2111.02413 [hep-ph]

work page arXiv 2022

[54] [54]

Becher, T

T. Becher, T. Rauh, and X. Xu, JHEP08, 134 (2022), arXiv:2112.02108 [hep-ph]

work page arXiv 2022

[55] [55]

Becher, N

T. Becher, N. Schalch, and X. Xu, Phys. Rev. Lett.132, 081602 (2024), arXiv:2307.02283 [hep-ph]

work page arXiv 2024

[56] [56]

Ferrario Ravasio, K

S. Ferrario Ravasio, K. Hamilton, A. Karlberg, G. P. Salam, L. Scyboz, and G. Soyez, Phys. Rev. Lett.131, 161906 (2023), arXiv:2307.11142 [hep-ph]

work page arXiv 2023

[57] [57]

Leigh, D

M. Leigh, D. Sengupta, G. Qu´ etant, J. A. Raine, K. Zoch, and T. Golling, SciPost Phys.16, 018 (2024), arXiv:2303.05376 [hep-ph]

work page arXiv 2024

[58] [58]

Y. S. Lai, D. Neill, M. P losko´ n, and F. Ringer, Phys. Lett. B829, 137055 (2022), arXiv:2012.06582 [hep-ph]

work page arXiv 2022

[59] [59]

Non-global logarithms in jet and isolation cone cross sections

M. Balsiger, T. Becher, and D. Y. Shao, JHEP08, 104 (2018), arXiv:1803.07045 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[60] [60]

Balsiger, T

M. Balsiger, T. Becher, and A. Ferroglia, JHEP09, 029 (2020), arXiv:2006.00014 [hep-ph]

work page arXiv 2020

[61] [61]

An Effective Field Theory for Jet Processes

T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, Phys. Rev. Lett.116, 192001 (2016), arXiv:1508.06645 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[62] [62]

Factorization and Resummation for Jet Processes

T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, JHEP11, 019 (2016), [Erratum: JHEP 05, 154 (2017)], arXiv:1605.02737 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[63] [63]

A. J. Larkoski, I. Moult, and D. Neill, JHEP09, 143 (2015), arXiv:1501.04596 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2015

[64] [64]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Du- venaud, inAdvances in Neural Information Processing Systems 31 (NeurIPS)(2018) arXiv:1806.07366 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[65] [65]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30 (NeurIPS)(2017) arXiv:1706.03762 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[66] [66]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, inInternational Con- ference on Learning Representations (ICLR)(2019) arXiv:1711.05101 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[67] [67]

Improving language understanding by gen- erative pre-training,

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by gen- erative pre-training,” (2018), openAI preprint

work page 2018

[68] [68]

Language models are unsupervised multi- task learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multi- task learners,” (2019), openAI preprint

work page 2019

[69] [69]

Finke, M

T. Finke, M. Kr¨ amer, A. M¨ uck, and J. T¨ onshoff, JHEP 06, 184 (2023), arXiv:2303.07364 [hep-ph]

work page arXiv 2023

[70] [70]

Gaussian Error Linear Units (GELUs)

D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” (2016), arXiv:1606.08415 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[71] [71]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normal- ization,” (2016), arXiv:1607.06450 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[72] [72]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, inProceedings of the 2014 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP)(2014) pp. 1724–1734, arXiv:1406.1078 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2014

[73] [73]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Em- pirical evaluation of gated recurrent neural networks on sequence modeling,” (2014), arXiv:1412.3555 [cs.NE]

work page internal anchor Pith review Pith/arXiv arXiv 2014

[74] [74]

Diaz and A

R. Diaz and A. Marathe, inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019) pp. 4738–4747

work page 2019