A B-factory continuum retune of PYTHIA 8 hadronization parameters using BELLE and BABAR identified-hadron data
Pith reviewed 2026-07-01 05:04 UTC · model grok-4.3
The pith
Refining five hadronization parameters in PYTHIA 8 yields a bin-weighted score of 73.42 on BELLE and BABAR data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Starting from the five-parameter hadronization subset of a pp tune, a staged extension guided by remaining differences in the data produces a refined tune that scores 73.42 on the common set of 20,803 bins from BELLE and BABAR, outperforming the Skands e+e- reference (76.49) and Monash 2013 (79.22). The refined tune remains best for the BELLE charged-hadron, baryon, and single- and dihadron measurements, while Skands is still better for the BABAR charged-hadron sample and the BELLE meson sample.
What carries the argument
The five-parameter hadronization subset, extended in stages and scored on a fixed common set of 20,803 bins with one-million-event samples per tune point.
If this is right
- The refined tune remains the best-scoring option for BELLE charged-hadron, baryon, single- and dihadron measurements.
- Skands continues to perform better on the BABAR charged-hadron sample and the BELLE meson sample.
- The bin-weighted ordering across all measurements is driven primarily by the BELLE 2020 sample.
- The refined tune produces a small but stable improvement across the selected BELLE and BABAR measurements despite remaining dataset differences.
Where Pith is reading between the lines
- The same staged-extension method could be applied to other e+e- data sets to further constrain hadronization parameters shared with pp modeling.
- Persistent differences between BELLE and BABAR samples on meson observables point to possible additional parameters worth isolating in future tuning steps.
- The fixed-bin, fixed-event-count scoring protocol offers a reproducible way to rank tunes across multiple experiments without reweighting artifacts.
Load-bearing premise
The staged extension of the five-parameter subset produces a stable global improvement when evaluated on the fixed set of bins and event samples.
What would settle it
A new measurement set on the same observables that reverses the score ordering so the refined tune scores worse than both Skands and Monash on the full collection of bins.
Figures
read the original abstract
We refine the hadronization sector of a pp PYTHIA~8 tune in $e^+e^-\to q\bar q$ production using selected BELLE and BABAR measurements near the $\Upsilon(4S)$ region. The study is performed with PYTHIA~8.316 and Rivet~4.1.1 and is restricted to parameters used in this $e^+e^-$ setup. Starting from the five-parameter hadronization subset of the pp tune, we carry out a staged extension guided by the remaining differences in the BELLE and BABAR data. The final comparison uses a fixed common set of 20,803 bins and samples of 1,000,000 generated events per analysis and tune point. On this basis, the selected refined tune gives a bin-weighted score of 73.42, compared with 76.49 for the Skands $e^+e^-$ reference and 79.22 for Monash~2013 $e^+e^-$. It remains the best-scoring tune for the BELLE charged-hadron, baryon, and single- and dihadron measurements, while Skands still performs better for the BABAR charged-hadron sample and the BELLE meson sample. The bin-weighted ordering is driven primarily by the BELLE 2020 sample. The refined tune gives a small but stable improvement for the selected BELLE and BABAR measurements, although clear differences between the individual datasets remain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript refines the hadronization parameters of PYTHIA 8.316 in the e+e- to q qbar continuum using identified-hadron spectra from BELLE and BABAR near the Υ(4S). Starting from the five-parameter hadronization subset of an existing pp tune, the authors perform a staged extension of the parameter set, with each step chosen by inspecting residuals against the data. The final comparison employs a fixed set of 20,803 bins and 1,000,000-event samples per tune, yielding a bin-weighted score of 73.42 for the refined tune versus 76.49 (Skands e+e- reference) and 79.22 (Monash 2013). The refined tune performs best on most BELLE samples but not on the BABAR charged-hadron or BELLE meson samples; the ordering is driven primarily by the BELLE 2020 data.
Significance. If the reported improvement generalizes, the work supplies a modestly better e+e- starting point for hadronization modeling that could benefit both B-factory and LHC analyses. The use of large fixed event samples and a common bin set enables reproducible comparisons. However, because parameter selection and final scoring occur on identical data without held-out validation or cross-validation, the result primarily quantifies fit quality to these specific measurements rather than independent predictive improvement.
major comments (2)
- [Abstract and final comparison setup] The staged extension of the five-parameter subset is guided by residuals in the BELLE and BABAR data, after which the bin-weighted score (73.42) is computed on the identical fixed set of 20,803 bins and 1M-event samples. This procedure makes the reported ordering a measure of fit quality rather than an independent test, directly affecting the claim of a 'small but stable improvement' for the selected measurements.
- [Description of staged extension] No held-out subset, cross-validation, or independent dataset is used to validate the staged parameter choices. The assumption that inspection-guided extension produces a stable global improvement therefore rests on the same data used for both selection and scoring, which is load-bearing for the headline result.
minor comments (1)
- [Comparison setup] Clarify whether the 20,803 bins are exactly the union of all analyses or whether any bin selection/weighting choices were made after initial tuning steps.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for highlighting the distinction between fit quality and independent validation. We agree that our procedure quantifies performance on the data used for both parameter selection and scoring. We will revise the manuscript to make this scope explicit and to moderate the language around 'improvement' accordingly.
read point-by-point responses
-
Referee: [Abstract and final comparison setup] The staged extension of the five-parameter subset is guided by residuals in the BELLE and BABAR data, after which the bin-weighted score (73.42) is computed on the identical fixed set of 20,803 bins and 1M-event samples. This procedure makes the reported ordering a measure of fit quality rather than an independent test, directly affecting the claim of a 'small but stable improvement' for the selected measurements.
Authors: We agree that the reported score measures fit quality on the same data used for staged selection. The fixed bin set and event samples enable reproducible comparisons among tunes, but do not constitute an independent test. We will revise the abstract to replace 'small but stable improvement for the selected BELLE and BABAR measurements' with language that explicitly states the result is an improved description of these specific datasets, and we will add a clarifying sentence in the methods section. revision: yes
-
Referee: [Description of staged extension] No held-out subset, cross-validation, or independent dataset is used to validate the staged parameter choices. The assumption that inspection-guided extension produces a stable global improvement therefore rests on the same data used for both selection and scoring, which is load-bearing for the headline result.
Authors: The manuscript presents the staged extension as an exploratory procedure guided by residuals, not as a validated global optimum. We will add an explicit statement in the text acknowledging the absence of held-out validation or cross-validation and noting that the result should be interpreted as the best-performing tune within the explored set on these measurements. No independent datasets beyond the BELLE and BABAR samples employed are available for this study. revision: yes
Circularity Check
Staged tuning on BELLE/BABAR residuals then scored on identical 20,803-bin set reports fit quality as improvement
specific steps
-
fitted input called prediction
[Abstract]
"Starting from the five-parameter hadronization subset of the pp tune, we carry out a staged extension guided by the remaining differences in the BELLE and BABAR data. The final comparison uses a fixed common set of 20,803 bins and samples of 1,000,000 generated events per analysis and tune point. On this basis, the selected refined tune gives a bin-weighted score of 73.42, compared with 76.49 for the Skands e+e- reference and 79.22 for Monash 2013 e+e-."
The staged extension is chosen by inspecting residuals in the BELLE/BABAR data; the final score is then evaluated on precisely the same bin set and event samples. The reported improvement is therefore the result of the fitting/selection procedure itself rather than a prediction on independent data.
full rationale
The paper selects the refined tune via staged extension explicitly guided by residuals in the same BELLE and BABAR measurements, then computes the bin-weighted score on the exact same fixed set of 20,803 bins and 1M-event samples. This matches the fitted-input-called-prediction pattern: the reported ordering (73.42 vs 76.49/79.22) is the direct output of the selection process rather than an independent test. No held-out data or cross-validation is described. The central claim therefore reduces to a measure of fit quality on the tuning data.
Axiom & Free-Parameter Ledger
free parameters (1)
- hadronization parameters (5-parameter subset extended in stages)
axioms (1)
- domain assumption The PYTHIA 8.316 hadronization model is appropriate for e+e- to q qbar production near the Upsilon(4S) region.
Reference graph
Works this paper leans on
-
[1]
Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,
H. I. Alrebdi and M. Ajaz, “Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,” arXiv:2603.21364
-
[2]
M. Ajaz and H. I. Alrebdi, “Underlying-event and azimuthal-observable validation of a PYTHIA 8.316 soft-QCD retune in pp collisions at√s = 0.9and 7 TeV,” Eur. Phys. J. Plus, (2026). doi:10.1140/epjp/s13360-026-07938-5
-
[3]
An Introduction to PYTHIA 8.2,
T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen and P. Z. Skands, “An Introduction to PYTHIA 8.2,” Comput. Phys. Commun.191(2015) 159–177. doi:10.1016/j.cpc.2015.01.024
-
[4]
A comprehensive guide to the physics and usage of PYTHIA 8.3,
C. Bierlichet al., “A comprehensive guide to the physics and usage of PYTHIA 8.3,” SciPost Phys. Codeb.8(2022)
2022
-
[5]
Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,
M. Leitgabet al.[Belle Collaboration], “Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,” Phys. Rev. Lett.111(2013) 062002
2013
-
[6]
Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,
J. P. Leeset al.[BaBar Collaboration], “Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,” Phys. Rev. D88 (2013) 032011
2013
-
[7]
Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,
M. Niiyamaet al.[Belle Collaboration], “Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,” Phys. Rev. D97 (2018) 072005
2018
-
[8]
Update of inclusive cross sections of single and pairs of identified light charged hadrons,
R. Seidlet al.[Belle Collaboration], “Update of inclusive cross sections of single and pairs of identified light charged hadrons,” Phys. Rev. D101(2020) 092004
2020
-
[9]
Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,
R. Seidlet al.[Belle Collaboration], “Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,” Phys. Rev. D111(2025) 052003
2025
-
[10]
Parton Fragmen- tation and String Dynamics,
B. Andersson, G. Gustafson, G. Ingelman and T. Sjöstrand, “Parton Fragmen- tation and String Dynamics,” Phys. Rept.97(1983) 31–145. doi:10.1016/0370- 1573(83)90080-7
-
[11]
Tuning PYTHIA 8.1: the Monash 2013 Tune,
P. Skands, S. Carrazza and J. Rojo, “Tuning PYTHIA 8.1: the Monash 2013 Tune,” Eur. Phys. J. C74(2014) 3024
2013
-
[12]
Systematic event generator tuning for the LHC,
A. Buckley, H. Hoeth, H. Lacker, H. Schulz and J. E. von Seggern, “Systematic event generator tuning for the LHC,” Eur. Phys. J. C65(2010) 331–357
2010
-
[13]
Robust Independent Validation of Experiment and Theory: Rivet version 3,
C. Bierlichet al., “Robust Independent Validation of Experiment and Theory: Rivet version 3,” SciPost Phys.8(2020) 026
2020
-
[14]
Fitting using finite Monte Carlo samples,
R. Barlow and C. Beeston, “Fitting using finite Monte Carlo samples,” Comput. Phys. Commun.77(1993) 219–228
1993
-
[15]
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,
M. D. McKay, R. J. Beckman and W. J. Conover, “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,” Technometrics21(1979) 239–245. 17
1979
-
[16]
Baryon Production in the String Fragmentation Picture,
P. Eden and G. Gustafson, “Baryon Production in the String Fragmentation Picture,” Z. Phys. C75(1997) 41–49
1997
-
[17]
Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,
M. Waqar, H. I. Alrebdi, M. Waqas, K. S. Al-Mugren and M. Ajaz, “Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,” Chin. Phys. C48(2024) 093109
2024
-
[18]
Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,
H. I. Alrebdi, M. Ajaz, M. Waqas, M. A. Ahmad, M. Waqar, A. M. Quraishi, J. H. Baker, S. Jagnandan and A. Jagnandan, “Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,” Eur. Phys. J. Plus140(2025) 371
2025
-
[19]
Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,
M. Waqar, H. I. Alrebdi, M. Waqas, M. A. Ahmad and M. Ajaz, “Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,” Eur. Phys. J. Plus140(2025) 523. 18
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.