A B-factory continuum retune of PYTHIA 8 hadronization parameters using BELLE and BABAR identified-hadron data

Haifa I. Alrebdi; Muhammad Ajaz

arxiv: 2606.31346 · v1 · pith:ZISPVCIRnew · submitted 2026-06-30 · ✦ hep-ph

A B-factory continuum retune of PYTHIA 8 hadronization parameters using BELLE and BABAR identified-hadron data

Muhammad Ajaz , Haifa I. Alrebdi This is my paper

Pith reviewed 2026-07-01 05:04 UTC · model grok-4.3

classification ✦ hep-ph

keywords PYTHIA 8hadronization tuningBELLEBABARe+e- collisionsidentified hadronsparameter refinementB-factory data

0 comments

The pith

Refining five hadronization parameters in PYTHIA 8 yields a bin-weighted score of 73.42 on BELLE and BABAR data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper starts from the five-parameter hadronization subset of a proton-proton PYTHIA 8 tune and performs a staged extension guided by differences in BELLE and BABAR measurements of identified hadrons near the Upsilon(4S). The final comparison evaluates tunes on a fixed set of 20,803 bins using one million events per sample. The selected refined tune achieves an overall bin-weighted score of 73.42, lower and therefore better than the Skands e+e- reference at 76.49 and Monash 2013 at 79.22. It performs best on most BELLE samples but not on every individual dataset. The work demonstrates that modest, data-guided adjustments to hadronization parameters can produce a measurable global improvement when scored consistently across the two experiments.

Core claim

Starting from the five-parameter hadronization subset of a pp tune, a staged extension guided by remaining differences in the data produces a refined tune that scores 73.42 on the common set of 20,803 bins from BELLE and BABAR, outperforming the Skands e+e- reference (76.49) and Monash 2013 (79.22). The refined tune remains best for the BELLE charged-hadron, baryon, and single- and dihadron measurements, while Skands is still better for the BABAR charged-hadron sample and the BELLE meson sample.

What carries the argument

The five-parameter hadronization subset, extended in stages and scored on a fixed common set of 20,803 bins with one-million-event samples per tune point.

If this is right

The refined tune remains the best-scoring option for BELLE charged-hadron, baryon, single- and dihadron measurements.
Skands continues to perform better on the BABAR charged-hadron sample and the BELLE meson sample.
The bin-weighted ordering across all measurements is driven primarily by the BELLE 2020 sample.
The refined tune produces a small but stable improvement across the selected BELLE and BABAR measurements despite remaining dataset differences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged-extension method could be applied to other e+e- data sets to further constrain hadronization parameters shared with pp modeling.
Persistent differences between BELLE and BABAR samples on meson observables point to possible additional parameters worth isolating in future tuning steps.
The fixed-bin, fixed-event-count scoring protocol offers a reproducible way to rank tunes across multiple experiments without reweighting artifacts.

Load-bearing premise

The staged extension of the five-parameter subset produces a stable global improvement when evaluated on the fixed set of bins and event samples.

What would settle it

A new measurement set on the same observables that reverses the score ordering so the refined tune scores worse than both Skands and Monash on the full collection of bins.

Figures

Figures reproduced from arXiv: 2606.31346 by Haifa I. Alrebdi, Muhammad Ajaz.

**Figure 2.** Figure 2: Selected BELLE 2020 single- and dihadron spectra. The differences between the three tune [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Selected BELLE 2025 meson spectra. The selected tune is closer to the data than Monash in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

We refine the hadronization sector of a pp PYTHIA~8 tune in $e^+e^-\to q\bar q$ production using selected BELLE and BABAR measurements near the $\Upsilon(4S)$ region. The study is performed with PYTHIA~8.316 and Rivet~4.1.1 and is restricted to parameters used in this $e^+e^-$ setup. Starting from the five-parameter hadronization subset of the pp tune, we carry out a staged extension guided by the remaining differences in the BELLE and BABAR data. The final comparison uses a fixed common set of 20,803 bins and samples of 1,000,000 generated events per analysis and tune point. On this basis, the selected refined tune gives a bin-weighted score of 73.42, compared with 76.49 for the Skands $e^+e^-$ reference and 79.22 for Monash~2013 $e^+e^-$. It remains the best-scoring tune for the BELLE charged-hadron, baryon, and single- and dihadron measurements, while Skands still performs better for the BABAR charged-hadron sample and the BELLE meson sample. The bin-weighted ordering is driven primarily by the BELLE 2020 sample. The refined tune gives a small but stable improvement for the selected BELLE and BABAR measurements, although clear differences between the individual datasets remain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A transparent but incremental PYTHIA retune that improves the score on the same Belle/Babar bins used for selection, with no held-out check.

read the letter

The main thing here is a new set of five-plus hadronization parameters for PYTHIA 8 in e+e- at the Upsilon(4S), starting from the pp tune subset and extending it stage by stage to reduce residuals against Belle and Babar identified-hadron spectra. They run 1M events per tune point through Rivet on a fixed 20,803-bin set and report a bin-weighted score of 73.42 versus 76.49 for the Skands e+e- reference and 79.22 for Monash 2013. The new tune wins on most Belle samples and the baryon measurements but loses to Skands on the Babar charged-hadron sample and Belle mesons.

What works is the disciplined setup: same generator version, same Rivet analyses, same event count, and explicit comparison to two established tunes. The staged approach is described plainly, and they note that the ordering is driven mainly by the Belle 2020 data while acknowledging that individual datasets still disagree.

The soft spot is exactly the one the stress test flags. Parameter choices at each stage were guided by inspecting the same measurements that later enter the final score, with no held-out bins, no cross-validation, and no independent datasets mentioned. That makes the 73.42 number a measure of fit quality rather than a prediction. The fact that Skands remains better on two samples already shows the data are not perfectly consistent, so further tuning on this collection can easily chase dataset-specific features.

This is for people who need a PYTHIA tune tuned specifically to these B-factory spectra and are willing to accept the circularity. It is not a conceptual advance and will not change how most people use the generator. A serious editor should send it to peer review because the comparison is reproducible and the limitations are stated openly; referees can then decide whether the modest gain justifies adopting the new tune or whether the lack of validation is decisive.

Referee Report

2 major / 1 minor

Summary. The manuscript refines the hadronization parameters of PYTHIA 8.316 in the e+e- to q qbar continuum using identified-hadron spectra from BELLE and BABAR near the Υ(4S). Starting from the five-parameter hadronization subset of an existing pp tune, the authors perform a staged extension of the parameter set, with each step chosen by inspecting residuals against the data. The final comparison employs a fixed set of 20,803 bins and 1,000,000-event samples per tune, yielding a bin-weighted score of 73.42 for the refined tune versus 76.49 (Skands e+e- reference) and 79.22 (Monash 2013). The refined tune performs best on most BELLE samples but not on the BABAR charged-hadron or BELLE meson samples; the ordering is driven primarily by the BELLE 2020 data.

Significance. If the reported improvement generalizes, the work supplies a modestly better e+e- starting point for hadronization modeling that could benefit both B-factory and LHC analyses. The use of large fixed event samples and a common bin set enables reproducible comparisons. However, because parameter selection and final scoring occur on identical data without held-out validation or cross-validation, the result primarily quantifies fit quality to these specific measurements rather than independent predictive improvement.

major comments (2)

[Abstract and final comparison setup] The staged extension of the five-parameter subset is guided by residuals in the BELLE and BABAR data, after which the bin-weighted score (73.42) is computed on the identical fixed set of 20,803 bins and 1M-event samples. This procedure makes the reported ordering a measure of fit quality rather than an independent test, directly affecting the claim of a 'small but stable improvement' for the selected measurements.
[Description of staged extension] No held-out subset, cross-validation, or independent dataset is used to validate the staged parameter choices. The assumption that inspection-guided extension produces a stable global improvement therefore rests on the same data used for both selection and scoring, which is load-bearing for the headline result.

minor comments (1)

[Comparison setup] Clarify whether the 20,803 bins are exactly the union of all analyses or whether any bin selection/weighting choices were made after initial tuning steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for highlighting the distinction between fit quality and independent validation. We agree that our procedure quantifies performance on the data used for both parameter selection and scoring. We will revise the manuscript to make this scope explicit and to moderate the language around 'improvement' accordingly.

read point-by-point responses

Referee: [Abstract and final comparison setup] The staged extension of the five-parameter subset is guided by residuals in the BELLE and BABAR data, after which the bin-weighted score (73.42) is computed on the identical fixed set of 20,803 bins and 1M-event samples. This procedure makes the reported ordering a measure of fit quality rather than an independent test, directly affecting the claim of a 'small but stable improvement' for the selected measurements.

Authors: We agree that the reported score measures fit quality on the same data used for staged selection. The fixed bin set and event samples enable reproducible comparisons among tunes, but do not constitute an independent test. We will revise the abstract to replace 'small but stable improvement for the selected BELLE and BABAR measurements' with language that explicitly states the result is an improved description of these specific datasets, and we will add a clarifying sentence in the methods section. revision: yes
Referee: [Description of staged extension] No held-out subset, cross-validation, or independent dataset is used to validate the staged parameter choices. The assumption that inspection-guided extension produces a stable global improvement therefore rests on the same data used for both selection and scoring, which is load-bearing for the headline result.

Authors: The manuscript presents the staged extension as an exploratory procedure guided by residuals, not as a validated global optimum. We will add an explicit statement in the text acknowledging the absence of held-out validation or cross-validation and noting that the result should be interpreted as the best-performing tune within the explored set on these measurements. No independent datasets beyond the BELLE and BABAR samples employed are available for this study. revision: yes

Circularity Check

1 steps flagged

Staged tuning on BELLE/BABAR residuals then scored on identical 20,803-bin set reports fit quality as improvement

specific steps

fitted input called prediction [Abstract]
"Starting from the five-parameter hadronization subset of the pp tune, we carry out a staged extension guided by the remaining differences in the BELLE and BABAR data. The final comparison uses a fixed common set of 20,803 bins and samples of 1,000,000 generated events per analysis and tune point. On this basis, the selected refined tune gives a bin-weighted score of 73.42, compared with 76.49 for the Skands e+e- reference and 79.22 for Monash 2013 e+e-."

The staged extension is chosen by inspecting residuals in the BELLE/BABAR data; the final score is then evaluated on precisely the same bin set and event samples. The reported improvement is therefore the result of the fitting/selection procedure itself rather than a prediction on independent data.

full rationale

The paper selects the refined tune via staged extension explicitly guided by residuals in the same BELLE and BABAR measurements, then computes the bin-weighted score on the exact same fixed set of 20,803 bins and 1M-event samples. This matches the fitted-input-called-prediction pattern: the reported ordering (73.42 vs 76.49/79.22) is the direct output of the selection process rather than an independent test. No held-out data or cross-validation is described. The central claim therefore reduces to a measure of fit quality on the tuning data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the PYTHIA 8 hadronization model for e+e- continuum production and the suitability of the selected data bins for tuning. The free parameters are the hadronization ones being adjusted to data.

free parameters (1)

hadronization parameters (5-parameter subset extended in stages)
Starting from the five-parameter hadronization subset of the pp tune, extended guided by differences in BELLE and BABAR data.

axioms (1)

domain assumption The PYTHIA 8.316 hadronization model is appropriate for e+e- to q qbar production near the Upsilon(4S) region.
The study is performed with PYTHIA 8.316 and restricted to parameters used in this e+e- setup.

pith-pipeline@v0.9.1-grok · 5810 in / 1738 out tokens · 65838 ms · 2026-07-01T05:04:36.172680+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 4 canonical work pages

[1]

Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,

H. I. Alrebdi and M. Ajaz, “Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,” arXiv:2603.21364

work page arXiv
[2]

Underlying-event and azimuthal-observable validation of a PYTHIA 8.316 soft-QCD retune in pp collisions at√s = 0.9and 7 TeV,

M. Ajaz and H. I. Alrebdi, “Underlying-event and azimuthal-observable validation of a PYTHIA 8.316 soft-QCD retune in pp collisions at√s = 0.9and 7 TeV,” Eur. Phys. J. Plus, (2026). doi:10.1140/epjp/s13360-026-07938-5

work page doi:10.1140/epjp/s13360-026-07938-5 2026
[3]

An Introduction to PYTHIA 8.2,

T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen and P. Z. Skands, “An Introduction to PYTHIA 8.2,” Comput. Phys. Commun.191(2015) 159–177. doi:10.1016/j.cpc.2015.01.024

work page doi:10.1016/j.cpc.2015.01.024 2015
[4]

A comprehensive guide to the physics and usage of PYTHIA 8.3,

C. Bierlichet al., “A comprehensive guide to the physics and usage of PYTHIA 8.3,” SciPost Phys. Codeb.8(2022)

2022
[5]

Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,

M. Leitgabet al.[Belle Collaboration], “Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,” Phys. Rev. Lett.111(2013) 062002

2013
[6]

Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,

J. P. Leeset al.[BaBar Collaboration], “Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,” Phys. Rev. D88 (2013) 032011

2013
[7]

Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,

M. Niiyamaet al.[Belle Collaboration], “Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,” Phys. Rev. D97 (2018) 072005

2018
[8]

Update of inclusive cross sections of single and pairs of identified light charged hadrons,

R. Seidlet al.[Belle Collaboration], “Update of inclusive cross sections of single and pairs of identified light charged hadrons,” Phys. Rev. D101(2020) 092004

2020
[9]

Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,

R. Seidlet al.[Belle Collaboration], “Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,” Phys. Rev. D111(2025) 052003

2025
[10]

Parton Fragmen- tation and String Dynamics,

B. Andersson, G. Gustafson, G. Ingelman and T. Sjöstrand, “Parton Fragmen- tation and String Dynamics,” Phys. Rept.97(1983) 31–145. doi:10.1016/0370- 1573(83)90080-7

work page doi:10.1016/0370- 1983
[11]

Tuning PYTHIA 8.1: the Monash 2013 Tune,

P. Skands, S. Carrazza and J. Rojo, “Tuning PYTHIA 8.1: the Monash 2013 Tune,” Eur. Phys. J. C74(2014) 3024

2013
[12]

Systematic event generator tuning for the LHC,

A. Buckley, H. Hoeth, H. Lacker, H. Schulz and J. E. von Seggern, “Systematic event generator tuning for the LHC,” Eur. Phys. J. C65(2010) 331–357

2010
[13]

Robust Independent Validation of Experiment and Theory: Rivet version 3,

C. Bierlichet al., “Robust Independent Validation of Experiment and Theory: Rivet version 3,” SciPost Phys.8(2020) 026

2020
[14]

Fitting using finite Monte Carlo samples,

R. Barlow and C. Beeston, “Fitting using finite Monte Carlo samples,” Comput. Phys. Commun.77(1993) 219–228

1993
[15]

A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,

M. D. McKay, R. J. Beckman and W. J. Conover, “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,” Technometrics21(1979) 239–245. 17

1979
[16]

Baryon Production in the String Fragmentation Picture,

P. Eden and G. Gustafson, “Baryon Production in the String Fragmentation Picture,” Z. Phys. C75(1997) 41–49

1997
[17]

Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,

M. Waqar, H. I. Alrebdi, M. Waqas, K. S. Al-Mugren and M. Ajaz, “Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,” Chin. Phys. C48(2024) 093109

2024
[18]

Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,

H. I. Alrebdi, M. Ajaz, M. Waqas, M. A. Ahmad, M. Waqar, A. M. Quraishi, J. H. Baker, S. Jagnandan and A. Jagnandan, “Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,” Eur. Phys. J. Plus140(2025) 371

2025
[19]

Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,

M. Waqar, H. I. Alrebdi, M. Waqas, M. A. Ahmad and M. Ajaz, “Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,” Eur. Phys. J. Plus140(2025) 523. 18

2025

[1] [1]

Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,

H. I. Alrebdi and M. Ajaz, “Sequential retuning of PYTHIA 8.316 to a global soft-QCD basis in pp collisions at√s= 0.9–13 TeV,” arXiv:2603.21364

work page arXiv

[2] [2]

Underlying-event and azimuthal-observable validation of a PYTHIA 8.316 soft-QCD retune in pp collisions at√s = 0.9and 7 TeV,

M. Ajaz and H. I. Alrebdi, “Underlying-event and azimuthal-observable validation of a PYTHIA 8.316 soft-QCD retune in pp collisions at√s = 0.9and 7 TeV,” Eur. Phys. J. Plus, (2026). doi:10.1140/epjp/s13360-026-07938-5

work page doi:10.1140/epjp/s13360-026-07938-5 2026

[3] [3]

An Introduction to PYTHIA 8.2,

T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen and P. Z. Skands, “An Introduction to PYTHIA 8.2,” Comput. Phys. Commun.191(2015) 159–177. doi:10.1016/j.cpc.2015.01.024

work page doi:10.1016/j.cpc.2015.01.024 2015

[4] [4]

A comprehensive guide to the physics and usage of PYTHIA 8.3,

C. Bierlichet al., “A comprehensive guide to the physics and usage of PYTHIA 8.3,” SciPost Phys. Codeb.8(2022)

2022

[5] [5]

Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,

M. Leitgabet al.[Belle Collaboration], “Precision Measurement of Charged Pion and Kaon Differential Cross Sections ine+e− Annihilation at √s = 10.52GeV,” Phys. Rev. Lett.111(2013) 062002

2013

[6] [6]

Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,

J. P. Leeset al.[BaBar Collaboration], “Production of charged pions, kaons, and protons in e+e− annihilations into hadrons at√s = 10.54GeV,” Phys. Rev. D88 (2013) 032011

2013

[7] [7]

Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,

M. Niiyamaet al.[Belle Collaboration], “Production cross sections of hyperons and charmed baryons frome+e− annihilation near √s = 10.52GeV,” Phys. Rev. D97 (2018) 072005

2018

[8] [8]

Update of inclusive cross sections of single and pairs of identified light charged hadrons,

R. Seidlet al.[Belle Collaboration], “Update of inclusive cross sections of single and pairs of identified light charged hadrons,” Phys. Rev. D101(2020) 092004

2020

[9] [9]

Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,

R. Seidlet al.[Belle Collaboration], “Production cross sections of light and charmed mesons ine +e− annihilation near 10.58 GeV,” Phys. Rev. D111(2025) 052003

2025

[10] [10]

Parton Fragmen- tation and String Dynamics,

B. Andersson, G. Gustafson, G. Ingelman and T. Sjöstrand, “Parton Fragmen- tation and String Dynamics,” Phys. Rept.97(1983) 31–145. doi:10.1016/0370- 1573(83)90080-7

work page doi:10.1016/0370- 1983

[11] [11]

Tuning PYTHIA 8.1: the Monash 2013 Tune,

P. Skands, S. Carrazza and J. Rojo, “Tuning PYTHIA 8.1: the Monash 2013 Tune,” Eur. Phys. J. C74(2014) 3024

2013

[12] [12]

Systematic event generator tuning for the LHC,

A. Buckley, H. Hoeth, H. Lacker, H. Schulz and J. E. von Seggern, “Systematic event generator tuning for the LHC,” Eur. Phys. J. C65(2010) 331–357

2010

[13] [13]

Robust Independent Validation of Experiment and Theory: Rivet version 3,

C. Bierlichet al., “Robust Independent Validation of Experiment and Theory: Rivet version 3,” SciPost Phys.8(2020) 026

2020

[14] [14]

Fitting using finite Monte Carlo samples,

R. Barlow and C. Beeston, “Fitting using finite Monte Carlo samples,” Comput. Phys. Commun.77(1993) 219–228

1993

[15] [15]

A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,

M. D. McKay, R. J. Beckman and W. J. Conover, “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code,” Technometrics21(1979) 239–245. 17

1979

[16] [16]

Baryon Production in the String Fragmentation Picture,

P. Eden and G. Gustafson, “Baryon Production in the String Fragmentation Picture,” Z. Phys. C75(1997) 41–49

1997

[17] [17]

Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,

M. Waqar, H. I. Alrebdi, M. Waqas, K. S. Al-Mugren and M. Ajaz, “Comparative Analysis of Jet and Underlying Event Properties Across Various Models as a Function of Charged Particle Multiplicity at 7 TeV,” Chin. Phys. C48(2024) 093109

2024

[18] [18]

Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,

H. I. Alrebdi, M. Ajaz, M. Waqas, M. A. Ahmad, M. Waqar, A. M. Quraishi, J. H. Baker, S. Jagnandan and A. Jagnandan, “Comparative analysis of charged particle distributions and model predictions for underlying events with track-based selection in 13 TeV pp collisions,” Eur. Phys. J. Plus140(2025) 371

2025

[19] [19]

Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,

M. Waqar, H. I. Alrebdi, M. Waqas, M. A. Ahmad and M. Ajaz, “Hadron production models’ prediction forpT distribution of charged hadrons in pp interactions at LHC energies,” Eur. Phys. J. Plus140(2025) 523. 18

2025