Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
Pith reviewed 2026-05-21 08:02 UTC · model grok-4.3
The pith
Nested-GPT generates variable-multiplicity parton showers by sequentially predicting emissions and learning a termination condition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition; the resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered.
What carries the argument
Nested-GPT, the hierarchical autoregressive Transformer that models sequential emission prediction together with a learned termination condition to produce variable-length shower histories.
If this is right
- Nested-GPT supplies a physically consistent surrogate for variable-multiplicity parton-shower generators.
- The same architecture supports both direct training on vetoed histories and inclusive training followed by an analysis-level veto.
- The method provides a foundation for extending the resummation treatment to subleading logarithms.
- The results motivate further development toward finite-Nc color evolution inside the same autoregressive framework.
Where Pith is reading between the lines
- Enforcing the physical branching order inside the model may reduce the need for post-generation corrections in other sequential Monte Carlo simulations.
- The dynamic termination mechanism could transfer to other variable-length processes such as hadronization or decay chains.
- Combining the architecture with higher-order matrix elements might improve the accuracy of full event generation without manual multiplicity specification.
Load-bearing premise
The stochastic Monte Carlo dipole shower used to generate the training data correctly captures the leading-logarithmic resummation of non-global logarithms in the large-Nc limit.
What would settle it
A statistically significant mismatch between the gap-fraction distributions produced by Nested-GPT samples and those from the reference dipole shower, beyond the reported statistical uncertainties, would falsify the agreement claim.
Figures
read the original abstract
We introduce Nested-GPT, a hierarchical autoregressive Transformer architecture for simulating the variable-multiplicity parton-shower histories. As a controlled benchmark, we study the leading-logarithmic resummation of non-global logarithms in the large-$N_c$ limit, utilizing a stochastic Monte Carlo dipole shower to generate reference training data. We systematically evaluate Nested-GPT against a Transformer flow-matching baseline. The flow-matching framework successfully parameterizes the joint distribution of emission kinematics at fixed multiplicity. Its phase-space representation, however, requires the final number of emissions to be specified externally rather than generated dynamically. Conversely, Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition. We benchmark both approaches using gap fraction observables under two complementary training regimes: direct training on vetoed histories and inclusive training followed by an analysis-level veto. The resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered. These results establish Nested-GPT as a physically consistent autoregressive surrogate for variable-multiplicity shower generator and motivate extensions to subleading-logarithmic resummation and finite-$N_c$ color evolution.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Nested-GPT, a hierarchical autoregressive Transformer for generating variable-multiplicity parton-shower histories. Using a stochastic Monte Carlo dipole shower as reference in the large-N_c limit, it benchmarks the model on leading-logarithmic resummation of non-global logarithms. Nested-GPT is compared to a flow-matching Transformer baseline; the former enforces ordered Markovian branching with a learned termination condition while the latter requires external multiplicity specification. Generated samples are reported to agree with the reference within statistical uncertainties for gap-fraction observables under direct-veto and inclusive-plus-analysis-veto training regimes.
Significance. If the central results hold, the work provides a concrete demonstration that autoregressive Transformers can serve as physically consistent surrogates for variable-multiplicity showers by dynamically learning both emission kinematics and sequence termination. The explicit comparison to flow-matching and the use of two complementary training regimes are strengths that clarify the advantages of the Markovian structure for non-global logarithm resummation.
major comments (1)
- [Abstract] Abstract and benchmark setup: the central claim that Nested-GPT reproduces the reference shower via a learned sequence-termination condition is supported only by agreement on gap-fraction observables. No explicit validation of the multiplicity distribution or of the termination probability as a function of gap configuration is described, leaving open whether the joint distribution over emission number and kinematics remains faithful at multiplicities or phase-space points sparsely sampled in training.
minor comments (2)
- Notation for the hierarchical autoregressive structure and the precise definition of the learned termination condition should be clarified with an explicit equation or pseudocode block.
- Figure captions for the gap-fraction plots should state the statistical uncertainty bands and the number of generated events used for each curve.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the constructive feedback. We address the major comment below and indicate the revisions made to strengthen the validation of the model.
read point-by-point responses
-
Referee: [Abstract] Abstract and benchmark setup: the central claim that Nested-GPT reproduces the reference shower via a learned sequence-termination condition is supported only by agreement on gap-fraction observables. No explicit validation of the multiplicity distribution or of the termination probability as a function of gap configuration is described, leaving open whether the joint distribution over emission number and kinematics remains faithful at multiplicities or phase-space points sparsely sampled in training.
Authors: We thank the referee for this observation. The gap-fraction observables directly probe the leading-logarithmic resummation of non-global logarithms and are sensitive to the interplay between emission kinematics and the number of branchings. Nevertheless, we agree that explicit checks on the multiplicity distribution and the learned termination probability provide valuable additional evidence. In the revised manuscript we have added new figures comparing the generated multiplicity distributions to the reference Monte Carlo shower for both the direct-veto and inclusive-plus-analysis-veto training regimes; these distributions agree within statistical uncertainties over the full range of multiplicities populated by the reference. We have also included a plot of the termination probability conditioned on gap configuration, which reproduces the expected dependence arising from the ordered Markovian branching structure. These additions confirm that the joint distribution over emission number and kinematics is reproduced faithfully, including in regions with lower training statistics. revision: yes
Circularity Check
No circularity: validation against external Monte Carlo reference shower
full rationale
The paper generates training data from an independent stochastic Monte Carlo dipole shower and shows that Nested-GPT samples agree with this reference within statistical uncertainties on gap-fraction observables. The architecture enforces ordered Markovian branching by construction and learns a termination condition from the external data; neither step reduces to a self-referential fit or self-citation. A separate flow-matching baseline is used for comparison, providing an independent check. No load-bearing claim relies on prior author work or renames a known result as a new derivation. The central result is empirical reproduction of an external generator, which is self-contained and falsifiable against the reference.
Axiom & Free-Parameter Ledger
free parameters (1)
- Transformer hyperparameters and training schedule
axioms (2)
- domain assumption Parton-shower evolution can be represented as an ordered Markovian branching process
- domain assumption The large-Nc limit plus leading-log resummation is adequately captured by the stochastic dipole shower used for training data
invented entities (1)
-
Nested-GPT hierarchical autoregressive Transformer
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We benchmark both approaches using gap fraction observables under two complementary training regimes
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]
A. Butteret al., SciPost Phys.14, 079 (2023), arXiv:2203.07460 [hep-ph]
-
[2]
Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]
M. Ubiali, in2nd European AI for Fundamental Physics Conference(2026) arXiv:2602.03728 [hep-ph]
-
[3]
T. Cai, K. Li, and T. Li, (2026), arXiv:2605.03474 [hep- ph]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[4]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, inAdvances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Inc., 2014) pp. 2672–2680, arXiv:1406.2661 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[5]
M. Paganini, L. de Oliveira, and B. Nachman, Phys. Rev. D97, 014021 (2018), arXiv:1712.10321 [hep-ex]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
L. de Oliveira, M. Paganini, and B. Nachman, Comput. Softw. Big Sci.1, 4 (2017), arXiv:1701.05927 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [7]
- [8]
- [9]
-
[10]
G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mo- hamed, and B. Lakshminarayanan, J. Mach. Learn. Res. 22, 1 (2021), arXiv:1912.02762 [stat.ML]
- [11]
- [12]
- [13]
-
[14]
E. Bothmann, T. Janßen, M. Knobbe, T. Schmale, and S. Schumann, SciPost Phys.8, 069 (2020), arXiv:2001.05478 [hep-ph]
-
[15]
J. Ho, A. Jain, and P. Abbeel, inAdvances in Neu- ral Information Processing Systems 33 (NeurIPS)(2020) arXiv:2006.11239 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[16]
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Ku- mar, S. Ermon, and B. Poole, inInternational Con- ference on Learning Representations (ICLR)(2021) arXiv:2011.13456 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[17]
Flow Matching for Generative Modeling
Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, inInternational Conference on Learning Rep- resentations (ICLR)(2023) arXiv:2210.02747 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
M. S. Albergo and E. Vanden-Eijnden, inInternational Conference on Learning Representations (ICLR)(2023) arXiv:2209.15571 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
V. Mikuni and B. Nachman, Phys. Rev. D106, 092009 (2022), arXiv:2206.11898 [hep-ph]
-
[20]
EPiC-ly fast particle cloud generation with flow-matching and diffusion,
E. Buhmann, C. Ewen, D. A. Faroughy, T. Golling, G. Kasieczka, M. Leigh, G. Qu´ etant, J. A. Raine, D. Sengupta, and D. Shih, “EPiC-ly fast particle cloud generation with flow-matching and diffusion,” (2023), arXiv:2310.00049 [hep-ph]
- [21]
-
[22]
Modern Machine Learning for LHC Physicists,
T. Plehn, A. Butter, B. Dillon, T. Heimel, C. Krause, and R. Winterhalder, “Modern Machine Learning for LHC Physicists,” (2022), arXiv:2211.01421 [hep-ph]
-
[23]
A Living Review of Ma- chine Learning for Particle Physics,
HEP ML Community, “A Living Review of Ma- chine Learning for Particle Physics,”https://iml-wg. github.io/HEPML-LivingReview/(2022)
work page 2022
- [24]
-
[25]
A. D. Martin, W. J. Stirling, R. S. Thorne, and G. Watt, Eur. Phys. J. C63, 189 (2009), arXiv:0901.0002 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[26]
R. Kogleret al., Rev. Mod. Phys.91, 045003 (2019), arXiv:1803.06991 [hep-ex]
-
[27]
General-purpose event generators for LHC physics
A. Buckleyet al., Phys. Rept.504, 145 (2011), arXiv:1101.2599 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[28]
Reweighting a parton shower using a neural network: the final-state case
E. Bothmann and L. Del Debbio, JHEP01, 033 (2019), arXiv:1808.07802 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[29]
J. W. Monk, JHEP12, 021 (2018), arXiv:1807.03685 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [30]
-
[31]
K. Danziger, T. Janßen, S. Schumann, and F. Siegert, SciPost Phys.12, 164 (2022), arXiv:2109.11964 [hep-ph]
-
[32]
G. De Crescenzo, J. M. Villadamigo, N. Elmer, T. Heimel, T. Plehn, R. Winterhalder, and M. Zaro, “MadNIS at NLO,” (2026), arXiv:2603.22407 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[33]
MadSpace – Event Generation for the Era of GPUs and ML,
T. Heimel, O. Mattelaer, and R. Winterhalder, “MadSpace – Event Generation for the Era of GPUs and ML,” (2026), arXiv:2602.06895 [hep-ph]
-
[34]
FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,
J. M. Villadamigo, R. Frederix, T. Plehn, T. Vitos, and R. Winterhalder, “FASTColor – Full-color Amplitude Surrogate Toolkit for QCD,” (2025), arXiv:2509.07068 [hep-ph]
-
[35]
A comprehensive guide to the physics and usage of PYTHIA 8.3
C. Bierlichet al., SciPost Phys. Codebases , 8 (2022), arXiv:2203.11601 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]
E. Bothmannet al.(Sherpa), JHEP12, 156 (2024), arXiv:2410.22148 [hep-ph]
-
[37]
Herwig 7.0 / Herwig++ 3.0 Release Note
J. Bellmet al., Eur. Phys. J. C76, 196 (2016), arXiv:1512.01178 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[38]
Z. Nagy and D. E. Soper, Phys. Rev. D104, 054049 (2021), arXiv:2011.04773 [hep-ph]
- [39]
-
[40]
M. van Beekveldet al., Phys. Rev. Lett.134, 011901 12 (2025), arXiv:2406.02661 [hep-ph]
-
[41]
S. Ferrario Ravasio, in59th Rencontres de Moriond on QCD and High Energy Interactions: Moriond QCD 2025 (2025) arXiv:2505.13395 [hep-ph]
-
[42]
Resummation of non-global QCD observables
M. Dasgupta and G. P. Salam, Phys. Lett.B512, 323 (2001), arXiv:hep-ph/0104277 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[43]
A. Banfi, G. Marchesini, and G. Smye, JHEP08, 006 (2002), arXiv:hep-ph/0206076 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[44]
I. Balitsky, Phys. Rev. D60, 014020 (1999), arXiv:hep- ph/9812311
-
[45]
Y. V. Kovchegov, Phys. Rev. D61, 074018 (2000), arXiv:hep-ph/9905214
work page internal anchor Pith review Pith/arXiv arXiv 2000
-
[46]
Y. V. Kovchegov, Phys. Rev. D60, 034008 (1999), arXiv:hep-ph/9901281
work page internal anchor Pith review Pith/arXiv arXiv 1999
-
[47]
Resummation of non-global logarithms and the BFKL equation
S. Caron-Huot, JHEP03, 036 (2018), arXiv:1501.03754 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[48]
G. Brunello, S. Caron-Huot, G. Crisanti, M. Giroux, and S. Smith, JHEP11, 055 (2025), arXiv:2508.03794 [hep- ph]
-
[49]
J. R. Forshaw, A. Kyrieleis, and M. H. Seymour, JHEP 08, 059 (2006), arXiv:hep-ph/0604094
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[50]
Resummation of non-global logarithms at finite $N_c$
Y. Hatta and T. Ueda, Nucl. Phys. B874, 808 (2013), arXiv:1304.6930 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[51]
M. De Angelis, J. R. Forshaw, and S. Pl¨ atzer, Phys. Rev. Lett.126, 112001 (2021), arXiv:2007.09648 [hep-ph]
- [52]
- [53]
- [54]
- [55]
-
[56]
S. Ferrario Ravasio, K. Hamilton, A. Karlberg, G. P. Salam, L. Scyboz, and G. Soyez, Phys. Rev. Lett.131, 161906 (2023), arXiv:2307.11142 [hep-ph]
- [57]
- [58]
-
[59]
Non-global logarithms in jet and isolation cone cross sections
M. Balsiger, T. Becher, and D. Y. Shao, JHEP08, 104 (2018), arXiv:1803.07045 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[60]
M. Balsiger, T. Becher, and A. Ferroglia, JHEP09, 029 (2020), arXiv:2006.00014 [hep-ph]
-
[61]
An Effective Field Theory for Jet Processes
T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, Phys. Rev. Lett.116, 192001 (2016), arXiv:1508.06645 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[62]
Factorization and Resummation for Jet Processes
T. Becher, M. Neubert, L. Rothen, and D. Y. Shao, JHEP11, 019 (2016), [Erratum: JHEP 05, 154 (2017)], arXiv:1605.02737 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[63]
A. J. Larkoski, I. Moult, and D. Neill, JHEP09, 143 (2015), arXiv:1501.04596 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[64]
R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Du- venaud, inAdvances in Neural Information Processing Systems 31 (NeurIPS)(2018) arXiv:1806.07366 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[65]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30 (NeurIPS)(2017) arXiv:1706.03762 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[66]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, inInternational Con- ference on Learning Representations (ICLR)(2019) arXiv:1711.05101 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[67]
Improving language understanding by gen- erative pre-training,
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by gen- erative pre-training,” (2018), openAI preprint
work page 2018
-
[68]
Language models are unsupervised multi- task learners,
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multi- task learners,” (2019), openAI preprint
work page 2019
- [69]
-
[70]
Gaussian Error Linear Units (GELUs)
D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” (2016), arXiv:1606.08415 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[71]
J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normal- ization,” (2016), arXiv:1607.06450 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[72]
K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, inProceedings of the 2014 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP)(2014) pp. 1724–1734, arXiv:1406.1078 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[73]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Em- pirical evaluation of gated recurrent neural networks on sequence modeling,” (2014), arXiv:1412.3555 [cs.NE]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[74]
R. Diaz and A. Marathe, inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019) pp. 4738–4747
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.