Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

Ding Yu Shao; Hao-Zhe Shi; Wanchen Li; Yu-Xuan Sun

arxiv: 2605.18360 · v2 · pith:5P4WRR26new · submitted 2026-05-18 · ✦ hep-ph

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

Wanchen Li , Ding Yu Shao , Hao-Zhe Shi , Yu-Xuan Sun This is my paper

Pith reviewed 2026-05-20 09:38 UTC · model grok-4.3

classification ✦ hep-ph

keywords parton showersnon-global logarithmsresummationtransformerautoregressive modelvariable multiplicitylarge-Nc limitdipole shower

0 comments

The pith

Nested-GPT generates variable-length parton shower histories that match Monte Carlo dipole references for non-global logarithm resummation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hierarchical autoregressive transformer called Nested-GPT to simulate parton-shower histories in which the number of branchings is not fixed in advance. It trains the model on data from a stochastic Monte Carlo dipole shower that implements leading-logarithmic resummation of non-global logarithms in the large-Nc limit. The architecture predicts each emission sequentially while learning its own termination condition, thereby respecting the ordered Markovian structure of the shower. Benchmarks on gap-fraction observables show that samples produced by Nested-GPT agree with the reference shower within statistical uncertainties, whether the model is trained directly on vetoed histories or inclusively with an analysis-level veto. This establishes an autoregressive surrogate that can handle the variable multiplicity required by realistic QCD cascades.

Core claim

Nested-GPT is a hierarchical autoregressive Transformer that simulates variable-multiplicity parton-shower histories by predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition. When trained on reference data from a stochastic Monte Carlo dipole shower, it reproduces the leading-logarithmic resummation of non-global logarithms in the large-Nc limit, as measured by gap-fraction observables under both direct vetoed-history training and inclusive training followed by veto. The generated samples agree with the reference within statistical uncertainties, in contrast to a flow-matching baseline that requires the final multiplicity to be supplied from an

What carries the argument

Hierarchical autoregressive Transformer that enforces the ordered Markovian branching structure by sequential emission prediction and a learned termination condition.

If this is right

Generated samples agree with the reference shower within statistical uncertainties for the gap-fraction observables considered.
The model works under both direct training on vetoed histories and inclusive training followed by analysis-level veto.
The approach supplies a physically consistent autoregressive surrogate for variable-multiplicity shower generators.
The results motivate extensions to subleading-logarithmic resummation and finite-Nc color evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sequential architecture could be applied to other QCD processes that require dynamic multiplicity, such as hadronization modeling.
Integration with existing event generators might reduce the computational cost of producing high-multiplicity samples while preserving ordering constraints.
Testing the termination condition on analytically known resummed distributions beyond leading log would provide an independent check of generalization.

Load-bearing premise

The Monte Carlo dipole shower that supplies the training data accurately captures the essential leading-logarithmic non-global logarithm physics in the large-Nc limit, and the learned termination condition generalizes without systematic bias.

What would settle it

Generating a large ensemble of showers with Nested-GPT and finding statistically significant deviations in gap fractions or other infrared-safe observables from the Monte Carlo reference would falsify the claim of physical consistency.

Figures

Figures reproduced from arXiv: 2605.18360 by Ding Yu Shao, Hao-Zhe Shi, Wanchen Li, Yu-Xuan Sun.

**Figure 2.** Figure 2: FIG. 2. Comparison of the gap fraction [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Training history of the Nested-GPT model on the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Comparison of the inclusive shower samples gen [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Events from both models are terminated after genera [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

We introduce Nested-GPT, a hierarchical autoregressive Transformer architecture for simulating the variable-multiplicity parton-shower histories. As a controlled benchmark, we study the leading-logarithmic resummation of non-global logarithms in the large-$N_c$ limit, utilizing a stochastic Monte Carlo dipole shower to generate reference training data. We systematically evaluate Nested-GPT against a Transformer flow-matching baseline. The flow-matching framework successfully parameterizes the joint distribution of emission kinematics at fixed multiplicity. Its phase-space representation, however, requires the final number of emissions to be specified externally rather than generated dynamically. Conversely, Nested-GPT strictly enforces the ordered Markovian branching structure, predicting emissions sequentially and dynamically evaluating a learned sequence-termination condition. We benchmark both approaches using gap fraction observables under two complementary training regimes: direct training on vetoed histories and inclusive training followed by an analysis-level veto. The resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered. These results establish Nested-GPT as a physically consistent autoregressive surrogate for variable-multiplicity shower generator and motivate extensions to subleading-logarithmic resummation and finite-$N_c$ color evolution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nested-GPT adds a learned termination step to autoregressive transformers so they can output variable-length shower histories, and the generated gap fractions match the reference MC within errors.

read the letter

The main point is that this paper builds a hierarchical autoregressive transformer called Nested-GPT that generates parton-shower histories with a variable number of emissions. It predicts each branching sequentially, enforces the ordered Markovian structure, and uses a learned condition to decide when to stop. They test it on leading-log non-global logarithms in the large-Nc limit using a stochastic dipole shower for training data, and compare it to a fixed-multiplicity flow-matching baseline. The generated samples agree with the reference on gap fractions under both direct vetoed-history training and inclusive training plus analysis veto. That agreement is the concrete result they report. The architecture itself is new in this context and does a clean job of keeping the physical branching order while allowing dynamic multiplicity. The two training regimes are a useful check. The soft spots are straightforward. Gap fractions are inclusive observables, so matching them does not automatically confirm that the multiplicity distribution or the kinematics at fixed multiplicity are correct. A small shift in the learned termination probability could distort those without showing up in the gap fractions. Because the model is trained directly on the reference Monte Carlo output, the agreement is partly by construction, and the abstract gives no numbers on validation splits, overfitting tests, or error propagation. The stress-test concern about multiplicity bias therefore looks plausible from the description. This work is for people already working at the intersection of machine learning and precision QCD simulations, especially those building surrogates for parton showers. A reader focused on non-global resummation or on ML replacements for Monte Carlo generators would find the benchmarks relevant. It deserves a serious referee because it delivers a working implementation with direct comparisons, even though the validation needs to be expanded. I would send it to peer review and ask for multiplicity histograms and conditional distributions at fixed multiplicity.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Nested-GPT, a hierarchical autoregressive Transformer architecture for generating variable-multiplicity parton-shower histories. As a controlled benchmark in the leading-logarithmic resummation of non-global logarithms in the large-N_c limit, it trains on reference data from a stochastic Monte Carlo dipole shower and compares against a Transformer flow-matching baseline. Two training regimes are considered: direct training on vetoed histories and inclusive training with analysis-level veto. Generated samples are reported to agree with the reference shower within statistical uncertainties on gap-fraction observables, establishing Nested-GPT as a physically consistent autoregressive surrogate.

Significance. If the central results hold under more detailed validation, the work provides a concrete demonstration that an autoregressive model can enforce the ordered Markovian branching structure of a parton shower while dynamically determining multiplicity via a learned termination condition. This is a non-trivial advance over fixed-multiplicity approaches such as flow matching and supplies a practical template for extending ML-based resummation to subleading logarithms and finite-N_c color evolution.

major comments (2)

[Abstract and results section] The central claim that Nested-GPT reproduces the reference shower rests on statistical agreement for gap fractions (Abstract). Because gap fractions integrate over multiplicity, agreement at this level does not automatically guarantee that the learned sequence-termination probability reproduces the reference multiplicity distribution or the conditional kinematics at fixed multiplicity. An explicit comparison of multiplicity histograms and emission-angle distributions at fixed multiplicity is required to confirm the absence of systematic bias.
[Abstract] The manuscript states that samples agree 'within statistical uncertainties' but provides no information on validation splits, overfitting diagnostics, or how uncertainties are propagated from the reference Monte Carlo to the generated samples. Without these controls it is difficult to assess whether the agreement is robust or merely a consequence of training directly on the reference distribution.

minor comments (2)

[Methodology] The distinction between the two training regimes (direct vetoed-history training versus inclusive training plus analysis-level veto) should be illustrated with a schematic diagram or explicit pseudocode to make the difference in how the veto is applied fully transparent.
[Model architecture] Notation for the hierarchical nesting of the Transformer (e.g., how the outer and inner models interact) is introduced without a compact summary table; adding such a table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and the constructive comments. We address each major point below and have revised the manuscript to incorporate additional validation material.

read point-by-point responses

Referee: [Abstract and results section] The central claim that Nested-GPT reproduces the reference shower rests on statistical agreement for gap fractions (Abstract). Because gap fractions integrate over multiplicity, agreement at this level does not automatically guarantee that the learned sequence-termination probability reproduces the reference multiplicity distribution or the conditional kinematics at fixed multiplicity. An explicit comparison of multiplicity histograms and emission-angle distributions at fixed multiplicity is required to confirm the absence of systematic bias.

Authors: We agree that agreement on gap-fraction observables, which integrate over multiplicity, does not by itself establish that the learned termination probability and conditional kinematics match the reference at fixed multiplicity. In the revised manuscript we have added explicit multiplicity histograms and emission-angle distributions at fixed multiplicity in the results section. These comparisons show agreement with the reference Monte Carlo within statistical uncertainties, confirming that the autoregressive termination condition does not introduce systematic bias. revision: yes
Referee: [Abstract] The manuscript states that samples agree 'within statistical uncertainties' but provides no information on validation splits, overfitting diagnostics, or how uncertainties are propagated from the reference Monte Carlo to the generated samples. Without these controls it is difficult to assess whether the agreement is robust or merely a consequence of training directly on the reference distribution.

Authors: We acknowledge that the original manuscript did not provide sufficient detail on validation procedures. The revised version includes a dedicated subsection describing the training/validation split (with 20 % of the reference data held out), overfitting diagnostics via training and validation loss curves, and the uncertainty estimation method, which combines multiple independent reference Monte Carlo runs with bootstrap resampling to propagate statistical uncertainties to the generated samples. revision: yes

Circularity Check

1 steps flagged

Nested-GPT surrogate agreement with reference shower reduces to training-data reproduction by construction

specific steps

fitted input called prediction [Abstract]
"utilizing a stochastic Monte Carlo dipole shower to generate reference training data. [...] The resulting generated samples agree with the reference shower within statistical uncertainties for the observables considered. These results establish Nested-GPT as a physically consistent autoregressive surrogate for variable-multiplicity shower generator"

The model is fitted to histories produced by the reference MC shower; the subsequent claim that generated samples establish physical consistency therefore reproduces the training distribution on inclusive observables by construction of the supervised learning procedure.

full rationale

The paper trains Nested-GPT directly on event histories generated by the stochastic Monte Carlo dipole shower and then validates the model by showing that its generated samples agree with the same reference on gap-fraction observables. Because the central claim of physical consistency rests on this agreement for a model whose parameters are fitted to the reference distribution, the reported success is statistically forced rather than an independent test of the architecture. The Markovian ordering and learned termination condition add structural constraints, yet the observable-level match remains a direct consequence of the supervised training on the identical source. This constitutes partial circularity of the fitted-input-called-prediction type without requiring external self-citations or ansatzes.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; full details on model parameters and assumptions unavailable.

free parameters (1)

Transformer model hyperparameters
Typical for neural-network training but not specified in the abstract.

axioms (1)

domain assumption Parton-shower emissions follow an ordered Markovian branching process
This structure is strictly enforced by the Nested-GPT design as described in the abstract.

pith-pipeline@v0.9.0 · 5751 in / 1314 out tokens · 44325 ms · 2026-05-20T09:38:13.333968+00:00 · methodology

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)