Recognition: unknown
Highly boosted dielectron identification in proton-proton collisions at sqrt{s} = 13 TeV
Pith reviewed 2026-05-10 13:26 UTC · model grok-4.3
The pith
CMS develops multivariate models to tag highly boosted dielectrons that merge into one electromagnetic calorimeter cluster.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A new technique is developed to identify dielectrons with Lorentz boost gamma_L greater than 20 that produce one single merged cluster in the electromagnetic calorimeter. The identification uses two multivariate models, one when both electron tracks are reconstructed and one when only a single track is reconstructed. Efficiency is measured in proton-proton data at 13 TeV: boosted J/psi to e+e- gives an overall efficiency of 80 percent for the two-track model, while Z to mu+mu-gamma events with converted photons give about 60 percent for the single-track model. A dedicated energy correction for dielectron candidates is also derived from B to J/psi K data.
What carries the argument
Two multivariate classifiers that combine track and calorimeter information to distinguish merged dielectron clusters from background, with separate training for the two-track and single-track reconstruction cases.
If this is right
- The method recovers events in which high-pT resonances decay to electron pairs that would otherwise be lost to merged clusters.
- The energy correction improves the mass and transverse-momentum resolution for such merged candidates.
- Separate models for one-track and two-track cases allow the analysis to retain signal in different detector-response regimes.
- Efficiencies are measured in data, reducing reliance on simulation for this topology.
Where Pith is reading between the lines
- The same merged-cluster logic could be adapted to other lepton-pair or photon-pair signatures that become collinear at high boost.
- Future runs with higher instantaneous luminosity will increase the fraction of events requiring this identification, making the data-driven efficiency measurement more valuable.
- Cross-checks with additional control samples, such as other resonances decaying to electrons, would further test the transferability of the efficiency.
Load-bearing premise
The identification efficiency measured in boosted J/psi and converted-photon control samples transfers accurately to the signal processes of interest.
What would settle it
Applying the models to an independent control sample or to simulated signal events with known generator-level truth and finding efficiencies that differ by more than the quoted uncertainties from the 80 percent and 60 percent values.
Figures
read the original abstract
A new technique is developed to identify dielectrons (e$^+$e$^-$) with Lorentz boost $\gamma_\mathrm{L}$ $\gt$ 20 that produce one single merged cluster in the electromagnetic calorimeter of the CMS detector. The identification uses two multivariate models: one for the case where both electron tracks are reconstructed, and another where only one of the tracks is reconstructed. The efficiency is determined using proton-proton collision data collected at a center-of-mass energy of 13 TeV. Boosted J/$\psi$ mesons decaying into e$^+$e$^-$ pairs are used to estimate the efficiency of the model with two tracks, yielding an overall efficiency of 80%. The Z $\to$ $\mu^+\mu^-\gamma$ events, where the photon converts into a collimated dielectron, are used for the model with a single track, yielding an efficiency of about 60%. A dedicated energy correction for dielectron candidates is also developed using B$^\pm$ $\to$ J/$\psi$K$^\pm$ $\to$ e$^+$e$^-$K$^\pm$ data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a new technique for identifying highly boosted dielectrons (γ_L > 20) that merge into a single cluster in the CMS electromagnetic calorimeter. Two multivariate models are used: one for cases with both electron tracks reconstructed and one for single-track cases. Efficiencies are extracted from proton-proton collision data at √s = 13 TeV using boosted J/ψ → e⁺e⁻ decays (overall 80%) for the two-track model and Z → μ⁺μ⁻γ events with photon conversions (~60%) for the single-track model. A dedicated energy correction for dielectron candidates is developed using B± → J/ψ K± data.
Significance. If the efficiencies transfer reliably, the method could improve reconstruction of highly boosted dielectron pairs in high-p_T searches and measurements at the LHC. The data-driven extraction of efficiencies from real collision control samples is a clear strength, as it minimizes dependence on simulation for the quoted performance figures.
major comments (2)
- [Efficiency determination using control samples] The central efficiencies (80% from boosted J/ψ and ~60% from Z→μμγ conversions) are measured in control samples whose kinematics (fixed masses, specific production mechanisms) differ from typical signal dielectrons at high p_T with arbitrary opening angles inside the merged cluster. Track-finding efficiency, shower-shape variables, and single-cluster merging probability are sensitive to these differences, yet no direct comparison of MV input distributions or efficiency in signal-like Monte Carlo after identical selections is provided. This assumption is load-bearing for the quoted performance.
- [Multivariate identification models] The two multivariate models are described as trained or tuned on the control samples; without explicit validation that their response remains stable under the broader kinematic range and mass hypotheses of the intended signal processes, the overall identification efficiency claim rests on untested extrapolation.
minor comments (2)
- [Abstract] The abstract omits any mention of systematic uncertainties on the efficiencies, background modeling details, or cross-checks against simulation, which would help readers assess the robustness of the 80% and 60% figures.
- [Energy correction development] Clarify the impact of the dedicated energy correction on the final dielectron candidate selection and whether it introduces additional uncertainties that propagate into the quoted efficiencies.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our manuscript. The comments raise important points about the applicability of the efficiencies measured in control samples to the broader range of signal processes. We address each major comment below and have revised the manuscript accordingly to provide additional validation and clarification.
read point-by-point responses
-
Referee: [Efficiency determination using control samples] The central efficiencies (80% from boosted J/ψ and ~60% from Z→μμγ conversions) are measured in control samples whose kinematics (fixed masses, specific production mechanisms) differ from typical signal dielectrons at high p_T with arbitrary opening angles inside the merged cluster. Track-finding efficiency, shower-shape variables, and single-cluster merging probability are sensitive to these differences, yet no direct comparison of MV input distributions or efficiency in signal-like Monte Carlo after identical selections is provided. This assumption is load-bearing for the quoted performance.
Authors: We agree that the control samples have specific kinematic features, but they were selected because they provide high-purity, data-driven samples of highly boosted dielectrons that merge into single clusters—the exact topology targeted by the method. The multivariate inputs are dominated by local cluster shower shapes and track properties within the merged object, which depend primarily on the small opening angle and boost rather than the parent particle mass or production mechanism. Nevertheless, to strengthen the manuscript, we have added comparisons of the key multivariate input variable distributions between the J/ψ and Z-conversion control samples and a Monte Carlo sample of high-p_T dielectrons from a generic heavy-resonance decay process, after identical selections. We also include the measured efficiency versus p_T in simulation for both control-like and signal-like kinematics. These studies show consistency within uncertainties and support the quoted performance figures for the intended use cases. revision: yes
-
Referee: [Multivariate identification models] The two multivariate models are described as trained or tuned on the control samples; without explicit validation that their response remains stable under the broader kinematic range and mass hypotheses of the intended signal processes, the overall identification efficiency claim rests on untested extrapolation.
Authors: The models are trained exclusively on the data control samples to incorporate real detector response. To address the concern regarding stability across kinematics and mass hypotheses, the revised manuscript now includes an explicit study of the multivariate discriminator output and the resulting identification efficiency as functions of dielectron p_T, opening angle, and invariant mass. This validation is performed both in the control data and in simulated events covering a wider range of boosts and parent masses relevant to typical high-p_T searches. The response is found to be stable, with efficiency variations well within the systematic uncertainties already assigned. revision: yes
Circularity Check
No significant circularity: efficiencies measured directly from independent control data
full rationale
The paper's central results are empirical efficiencies extracted from separate control samples in collision data (boosted J/ψ decays for the two-track multivariate model and Z→μμγ conversions for the single-track model), plus an energy correction derived from B±→J/ψK± data. These are direct measurements on distinct processes rather than quantities fitted or derived from the signal sample itself. No equations, self-citations, or ansatze are presented that reduce the quoted 80% and ~60% efficiencies to the inputs by construction; the derivation chain consists of standard data-driven calibration steps that remain externally falsifiable.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions about electromagnetic calorimeter response and track reconstruction in CMS for high-boost electrons
Reference graph
Works this paper leans on
-
[1]
Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC
CMS Collaboration, “Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC”,JINST16(2021) P05014, doi:10.1088/1748-0221/16/05/P05014,arXiv:2012.06888
-
[2]
Four-lepton resonance at the Large Hadron Collider
V . Barger and H.-S. Lee, “Four-lepton resonance at the Large Hadron Collider”,Phys. Rev. D85(2012) 055030,doi:10.1103/PhysRevD.85.055030,arXiv:1111.0633
-
[3]
Theory and phenomenology of two-Higgs-doublet models
G. C. Branco et al., “Theory and phenomenology of two-Higgs-doublet models”,Phys. Rept.516(2012) 1,doi:10.1016/j.physrep.2012.02.002,arXiv:1106.0034
-
[4]
Illuminating dark photons with high-energy colliders
D. Curtin, R. Essig, S. Gori, and J. Shelton, “Illuminating dark photons with high-energy colliders”,JHEP02(2015) 157,doi:10.1007/JHEP02(2015)157, arXiv:1412.0018
-
[5]
CMS Collaboration, “Reconstruction of decays to merged photons using end-to-end deep learning with domain continuation in the CMS detector”,Phys. Rev. D108(2023) 052002, doi:10.1103/PhysRevD.108.052002,arXiv:2204.12313
-
[6]
CMS Collaboration, “Search for new resonances decaying to pairs of merged diphotons in proton-proton collisions at √s=13 TeV”,Phys. Rev. Lett.134(2025) 041801, doi:10.1103/PhysRevLett.134.041801,arXiv:2405.00834
-
[7]
ATLAS Collaboration, “A search for new resonances in multiple final states with a high transverse momentum Z boson in √s=13 TeV pp collisions with the ATLAS detector”, JHEP06(2023) 36,doi:10.1007/JHEP06(2023)036,arXiv:2209.15345
-
[8]
Search for heavy resonances decaying into four leptons with high Lorentz boosts in proton-proton collisions at √s=13 TeV
CMS Collaboration, “Search for heavy resonances decaying into four leptons with high Lorentz boosts in proton-proton collisions at √s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-EXO-24-006, 2025
2025
-
[9]
The CMS experiment at the CERN LHC
CMS Collaboration, “The CMS experiment at the CERN LHC”,JINST3(2008) S08004, doi:10.1088/1748-0221/3/08/S08004
-
[10]
Development of the CMS detector for the CERN LHC Run 3
CMS Collaboration, “Development of the CMS detector for the CERN LHC Run 3”, JINST19(2024) P05064,doi:10.1088/1748-0221/19/05/P05064, arXiv:2309.05466
-
[11]
Performance of the CMS Level-1 trigger in proton-proton collisions at √s=13 TeV
CMS Collaboration, “Performance of the CMS Level-1 trigger in proton-proton collisions at √s=13 TeV”,JINST15(2020) P10017, doi:10.1088/1748-0221/15/10/P10017,arXiv:2006.10165
-
[12]
CMS Collaboration, “The CMS trigger system”,JINST12(2017) P01020, doi:10.1088/1748-0221/12/01/P01020,arXiv:1609.02366
-
[13]
Performance of the CMS high-level trigger during LHC Run 2
CMS Collaboration, “Performance of the CMS high-level trigger during LHC Run 2”, JINST19(2024) P11021,doi:10.1088/1748-0221/19/11/P11021, arXiv:2410.17038. References 19
-
[14]
CMS Collaboration, “Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at √s=13 TeV”,JINST13(2018) P06015, doi:10.1088/1748-0221/13/06/P06015,arXiv:1804.04528
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1748-0221/13/06/p06015 2018
-
[15]
Description and performance of track and primary-vertex reconstruction with the CMS tracker
CMS Collaboration, “Description and performance of track and primary-vertex reconstruction with the CMS tracker”,JINST9(2014) P10009, doi:10.1088/1748-0221/9/10/P10009,arXiv:1405.6569
-
[16]
The CMS phase-1 pixel detector upgrade
CMS Tracker Group Collaboration, “The CMS Phase-1 pixel detector upgrade”,JINST 16(2021) P02027,doi:10.1088/1748-0221/16/02/P02027,arXiv:2012.14304
-
[17]
Particle-flow reconstruction and global event description with the CMS detector
CMS Collaboration, “Particle-flow reconstruction and global event description with the CMS detector”,JINST12(2017) P10003,doi:10.1088/1748-0221/12/10/P10003, arXiv:1706.04965
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1748-0221/12/10/p10003 2017
-
[18]
Reconstruction of electrons with the Gaussian-sum filter in the CMS tracker at the LHC
W. Adam, R. Fr ¨uhwirth, A. Strandlie, and T. Todorov, “Reconstruction of electrons with the Gaussian-sum filter in the CMS tracker at the LHC”,J. Phys. G: Nucl. Part. Phys.31 (2005) 9,doi:10.1088/0954-3899/31/9/N01,arXiv:physics/0306087
-
[19]
ECAL 2016 refined calibration and Run2 summary plots
CMS Collaboration, “ECAL 2016 refined calibration and Run2 summary plots”, CMS Detector Performance Summary CMS-DP-2020-021, 2020
2016
-
[20]
T. Sj ¨ostrand et al., “An introduction to PYTHIA 8.2”,Comput. Phys. Commun.191(2015) 159,doi:10.1016/j.cpc.2015.01.024,arXiv:1410.3012
-
[21]
J. Alwall et al., “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations”,JHEP07 (2014) 79,doi:10.1007/JHEP07(2014)079,arXiv:1405.0301
-
[22]
The EvtGen particle decay simulation package
D. J. Lange, “The EvtGen particle decay simulation package”,Nucl. Instrum. Meth. A462 (2001) 152,doi:10.1016/S0168-9002(01)00089-4
-
[23]
MiNNLOPS: a new method to match NNLO QCD to parton showers
P . F. Monni et al., “MiNNLOPS: a new method to match NNLO QCD to parton showers”, JHEP05(2020) 143,doi:10.1007/JHEP05(2020)143,arXiv:1908.06987. [Erratum:doi:10.1007/JHEP02(2022)031]
-
[24]
MiNNLO PS: optimizing 2→1 hadronic processes
P . F. Monni, E. Re, and M. Wiesemann, “MiNNLOPS: optimizing 2→1 hadronic processes”,Eur. Phys. J. C80(2020) 1075, doi:10.1140/epjc/s10052-020-08658-5,arXiv:2006.04133
-
[25]
PHOTOS — a universal Monte Carlo for QED radiative corrections: version 2.0
E. Barberio and Z. Wa ¸s, “PHOTOS — a universal Monte Carlo for QED radiative corrections: version 2.0”,Comput. Phys. Commun.79(1994) 291, doi:10.1016/0010-4655(94)90074-4
-
[26]
Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements
CMS Collaboration, “Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements”,Eur. Phys. J. C80(2020) 4, doi:10.1140/epjc/s10052-019-7499-4,arXiv:1903.12179
-
[27]
Parton distributions from high-precision collider data
R. D. Ball et al., “Parton distributions from high-precision collider data”,Eur. Phys. J. C 77(2017) 663,doi:10.1140/epjc/s10052-017-5199-5,arXiv:1706.00428
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1140/epjc/s10052-017-5199-5 2017
-
[28]
GEANT4 Collaboration, “GEANT4—a simulation toolkit”,Nucl. Instrum. Meth. A506 (2003) 250,doi:10.1016/S0168-9002(03)01368-8. 20
-
[29]
Precision luminosity measurement in proton-proton collisions at√s=13 TeV in 2015 and 2016 at CMS
CMS Collaboration, “Precision luminosity measurement in proton-proton collisions at√s=13 TeV in 2015 and 2016 at CMS”,Eur. Phys. J. C81(2021) 800, doi:10.1140/epjc/s10052-021-09538-2,arXiv:2104.01927
-
[30]
Pileup mitigation at CMS in 13 TeV data
CMS Collaboration, “Pileup mitigation at CMS in 13 TeV data”,JINST15(2020) P09018, doi:10.1088/1748-0221/15/09/P09018,arXiv:2003.00503
-
[31]
CMS luminosity measurement for the 2017 data-taking period at√s=13 TeV
CMS Collaboration, “CMS luminosity measurement for the 2017 data-taking period at√s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-LUM-17-004, 2018
2017
-
[32]
CMS luminosity measurement for the 2018 data-taking period at√s=13 TeV
CMS Collaboration, “CMS luminosity measurement for the 2018 data-taking period at√s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-LUM-18-002, 2019
2018
-
[33]
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system”, inProc.22 nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD ’16, p. 785. 2016. arXiv:1603.02754.doi:10.1145/2939672.2939785
-
[34]
J. Bergstra, D. Yamins, and D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures”, inProc.30 th Int. Conf. on Machine Learning, volume 28, p. 115. 2013.arXiv:1209.5111
-
[35]
Recording and reconstructing 10 billion unbiased B hadron decays in CMS
CMS Collaboration, “Recording and reconstructing 10 billion unbiased B hadron decays in CMS”, CMS Detector Performance Summary CMS-DP-2019-043, 2019
2019
-
[36]
A Study of the Reactionsψ ′ →γγψ
M. Oreglia, “A Study of the Reactionsψ ′ →γγψ”. PhD thesis, Stanford University, 1980. SLAC Report SLAC-R-236, see Appendix D
1980
-
[37]
Performance of the CMS muon trigger system in proton-proton collisions at √s=13 TeV
CMS Collaboration, “Performance of the CMS muon trigger system in proton-proton collisions at √s=13 TeV”,JINST16(2021) P07001, doi:10.1088/1748-0221/16/07/P07001,arXiv:2102.04790
-
[38]
CMS Collaboration, “Test of lepton flavor universality in B ± →K ±µ+µ− and B± →K ±e+e− decays in proton-proton collisions at √s=13 TeV”,Rep. Prog. Phys.87 (2024) 077802,doi:10.1088/1361-6633/ad4e65,arXiv:2401.07090. 21 A The CMS Collaboration Yerevan Physics Institute, Yerevan, Armenia A. Hayrapetyan, V . Makarenko , A. Tumasyan1 Institut f ¨ ur Hochenerg...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.