pith. sign in

arxiv: 2305.10915 · v3 · pith:SVXP2BZWnew · submitted 2023-05-18 · ✦ hep-ph

Optimizing The Cut And Count Method In Phenomenological Studies

Pith reviewed 2026-05-24 08:26 UTC · model grok-4.3

classification ✦ hep-ph
keywords cut and countoptimizationTwo Higgs Doublet Modelcharged Higgsphenomenologysignal significanceMadAnalysis5
0
0 comments X

The pith

An automated ranking scheme for observables optimizes cuts in the cut-and-count method, leading to enhanced discovery potential for new physics signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique to optimize the discrimination between signal and background events in phenomenological studies that rely on the cut-and-count approach. Central to this is a ranking scheme that evaluates the relative importance of different observables, combined with an iterative, methodical selection of cuts implemented via the MadAnalysis5 interface. Applied to the search for a singly charged Higgs in the Two Higgs Doublet Model, the method demonstrates higher discovery potential than the conventional practice of manually imposing cuts. A reader would care if this automation reliably extracts more information from collider data without additional computational cost.

Core claim

Automating the cut and count process using a ranking scheme to assess observable importance and a systematic way of choosing cuts results in an enhanced discovery potential compared with the more traditional way of imposing cuts, as shown in the context of a singly charged Higgs search in the Two Higgs Doublet Model.

What carries the argument

The ranking scheme that quantitatively assesses the relative importance of various observables involved in a new physics process.

If this is right

  • The optimized cuts provide better separation of signal from background in BSM searches.
  • This approach can be applied to any phenomenological study using cut-and-count methods.
  • It works iteratively with MadAnalysis5 to refine the analysis.
  • Enhanced discovery potential means higher chances of detecting new particles like the charged Higgs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could minimize subjective choices in cut selection across different analyses.
  • Testing the method on processes with known signals would verify if the improvements are robust.
  • The technique might complement machine learning approaches in future collider studies.

Load-bearing premise

The ranking scheme that quantitatively assesses the relative importance of observables produces cuts that genuinely improve signal significance rather than merely fitting statistical fluctuations in the simulated samples.

What would settle it

A direct comparison of the signal significance achieved with the optimized cuts versus traditional cuts on the same Monte Carlo samples for the 2HDM charged Higgs search, or validation on a well-understood Standard Model process.

Figures

Figures reproduced from arXiv: 2305.10915 by Agnivo Sarkar, Baradhwaj Coleppa, Gokul B. Krishna, Sujay Shil.

Figure 1
Figure 1. Figure 1: FIG. 1: The normalized distributions of the 29 observables mentioned in the text after the pre-selection cuts. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: (Left): The [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: The representation of the area parameter for the invariant mass [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Vertical line test carried over the rank 1 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5: A flowchart representation of the proposed algorithm. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6: The three important components in the flowchart. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7: The cutflow suggestions from each iteration as dicated by the vertical line test. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8: Comparison of the improvement in significance between the conventional cut-and-count method applied [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: , highlighting its constraint by the vertical line test in the sixth iteration -an achievement not attainable in a typical conventional cut-based study. IV. DISCUSSION AND CONCLUSION Particle physics is in the midst of an exciting time - while the searches at the LHC for hints of BSM physics have not borne fruit thus far, it is nevertheless a very interesting question to ask how much and how far one could … view at source ↗
read the original abstract

We introduce an optimization technique to discriminate signal and background in any phenomeno- logical study based on the cut and count-based method. The core ideas behind this technique are the introduction of a ranking scheme that can quantitatively assess the relative importance of var- ious observables involved in a new physics process, and a more methodical way of choosing what cuts to impose. The technique is an iterative process that works with the help of the MadAnalysis5 interface. Working in the context of a BSM (Beyond Standard Model) scenario where we carry out a signal search of singly charged Higgs in the context of the Two Higgs Doublet Model (2HDM), we demonstrate how automating the cut and count process in this specific way results in an enhanced discovery potential compared with the more traditional way of imposing cuts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces an iterative optimization technique for cut-and-count analyses in BSM phenomenology. It defines a ranking scheme to quantify the relative importance of observables and uses MadAnalysis5 to select cuts in a methodical, automated manner. The approach is demonstrated in a search for singly charged Higgs bosons within the Two Higgs Doublet Model, where the authors claim it yields an enhanced discovery potential relative to traditional manual cut imposition.

Significance. If the reported improvements prove robust against statistical fluctuations in the Monte Carlo samples, the technique could provide a useful systematization of cut selection for phenomenological studies, reducing reliance on ad-hoc choices while interfacing with existing public tools. The integration with MadAnalysis5 supports reproducibility, though the absence of explicit validation metrics limits assessment of its broader impact.

major comments (2)
  1. [Abstract] Abstract and method description: the central claim of 'enhanced discovery potential' is not supported by any quantitative details on the ranking metric, number of iterations, or the magnitude of improvement in S/√B. Without these, the result cannot be evaluated for load-bearing significance.
  2. [Method] Method and results sections: the iterative ranking and cut selection, as well as the final significance evaluation, are performed on the identical finite Monte Carlo samples with no mention of hold-out sets, k-fold cross-validation, or independent test samples. This directly undermines the claim that the procedure produces genuinely superior cuts rather than fits to sample-specific noise, as the stopping criterion itself is significance-based.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the specific 2HDM parameter point or benchmark used for the demonstration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comments. We address each major point below and indicate the changes planned for the revised version.

read point-by-point responses
  1. Referee: [Abstract] Abstract and method description: the central claim of 'enhanced discovery potential' is not supported by any quantitative details on the ranking metric, number of iterations, or the magnitude of improvement in S/√B. Without these, the result cannot be evaluated for load-bearing significance.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the claimed improvement. In the revised manuscript we will insert the specific values of the ranking metric, the number of iterations required for convergence, and the achieved gain in S/√B relative to the manual-cut baseline. These numbers are already present in the results section and can be moved to the abstract without altering the underlying analysis. revision: yes

  2. Referee: [Method] Method and results sections: the iterative ranking and cut selection, as well as the final significance evaluation, are performed on the identical finite Monte Carlo samples with no mention of hold-out sets, k-fold cross-validation, or independent test samples. This directly undermines the claim that the procedure produces genuinely superior cuts rather than fits to sample-specific noise, as the stopping criterion itself is significance-based.

    Authors: The referee correctly notes that the optimization and final significance evaluation use the same Monte Carlo samples and that no cross-validation or hold-out procedure is described. This is a genuine methodological limitation that could allow the procedure to fit statistical fluctuations. In the revised manuscript we will add an explicit discussion of this issue, report results obtained on an independent test sample generated with different random seeds, and include a brief stability check by repeating the ranking on subsamples. These additions will be presented as a new subsection in the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is self-contained procedure using external tool

full rationale

The paper introduces a ranking scheme and iterative cut-selection procedure that interfaces with the public MadAnalysis5 tool. No equations, parameters, or results are defined in terms of each other or reduced by construction to fitted inputs from the same samples. The demonstration in the 2HDM example compares the automated cuts against traditional ones on the same Monte Carlo samples, but this is an empirical comparison rather than a definitional loop. No self-citation load-bearing steps, uniqueness theorems, or ansatzes smuggled via prior work appear in the provided text. The derivation chain is therefore independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The ranking scheme itself may implicitly contain choices (e.g., figure of merit for ranking), but these are not enumerated.

pith-pipeline@v0.9.0 · 5668 in / 1172 out tokens · 22885 ms · 2026-05-24T08:26:24.109117+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 16 internal anchors

  1. [1]

    PT (ℓ2), 3

    PT (ℓ1), 2. PT (ℓ2), 3. PT (b1), 4. PT (b2), 5. PT (b3), 6. PT (b4), 7. η(ℓ1), 8. η(ℓ2), 9. η(b1), 10. η(b2), 11. η(b3),

  2. [2]

    ∆R(ℓ1, ℓ2), 14

    η(b4), 13. ∆R(ℓ1, ℓ2), 14. ∆R(b1, b2), 15. ∆R(b1, b3), 16. ∆R(b1, b4), 17. ∆R(b2, b3), 18. ∆R(b2, b4), 19. ∆R(b3, b4),

  3. [3]

    T HT , 21. ET , 22. M(ℓ1, ℓ2), 23. M(b1, b2), 24. M(b1, b3), 25. M(b1, b4), 26. M(b2, b3), 27. M(b2, b4), 28. M(b3, b4),

  4. [4]

    The initial distributions acquired following the application of preselection cuts are depicted in Figure 1

    M(ℓ1, ℓ2, b1, b2, b3, b4). The initial distributions acquired following the application of preselection cuts are depicted in Figure 1. By examin- ing these distributions, one can straightforwardly determine the selection cuts that maximizes the signal-to-background ratio. We have established a set of selection criteria from intution gained from the initia...

  5. [5]

    How does one choose the kinematic variable that will maximally aid S vs B?

  6. [6]

    Having identified the variable, how does one choose the exact cut that will maximally isolate the signal?

  7. [7]

    Iteration 2

    How does one continue the process taking care to ensure the significance increases with every step? We will begin by introducing relevant quantities of interest that will simultaneously answer these questions. 6 0 200 400 600 800 1000 pT [l1] (GeV/c) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Signal Background 0 200 400 600 800 1000 pT [l2] (GeV/c) 0.00 0.05...

  8. [8]

    The generated signal and background events (after the imposition of the preselection cuts) are fed into Mad- Analysis 5 and the distributions of the various kinematic obsevables are obtained

  9. [9]

    The observable with the highest rank is passed on to the stage of vertical line test that enables us to come up with the optimal cuts

    The Area Parameter is calculated for all observables and this is then used to sort them. The observable with the highest rank is passed on to the stage of vertical line test that enables us to come up with the optimal cuts

  10. [10]

    Observable hold: This representation showcases where we hold the remaining observable distributions for future iterations

  11. [11]

    In addition, if after this cut, we satisfy the LL condition (i.e., significance becomes 5 σ or higher), the process terminates

    If, after the imposition of the cut, enough signal events remain ( Ns > 10) and if an improvement in significance is obtained, the cut is accepted and the observable is dropped from further consideration. In addition, if after this cut, we satisfy the LL condition (i.e., significance becomes 5 σ or higher), the process terminates

  12. [12]

    If, after the imposition of the cut the LL condition is not satisfied, we pass on to the Collector Connector (CC): It takes the observable distributions from the hold and pushes it to the next step. Once a proper instruction (indicated by an arrow) hits the CC, it will collect all the observable distribution sets from the hold connected to it and then rec...

  13. [13]

    Otherwise, it remains inactive

    Pulse Switch (PS): This is an instantaneous switch that triggers the execution of an instruction when a specific condition is met (typically an ‘if’ condition in the program to check the minimum significance criterion). Otherwise, it remains inactive. Specifically, if it turns out that σ(k) < σ (k − 1) + 0.10, then that particular observable will be sent ...

  14. [14]

    The suggested cuts on following the algorithm at each iteration are shown in Figure 7 and the final cut flow chart is shown in Table IV

    The same steps continue until we either run out of observables for ranking or when the LL conditions are satisfied, i.e., the significance improved beyond 5 σ. The suggested cuts on following the algorithm at each iteration are shown in Figure 7 and the final cut flow chart is shown in Table IV. At this point, it would be reasonable to ask if an iterative...

  15. [15]

    Partial Symmetries of Weak Interactions,

    S. L. Glashow, “Partial Symmetries of Weak Interactions,” Nucl. Phys. 22, 579–588 (1961)

  16. [16]

    A Model of Leptons,

    Steven Weinberg, “A Model of Leptons,” Phys. Rev. Lett. 19, 1264–1266 (1967)

  17. [17]

    Weak and Electromagnetic Interactions,

    Abdus Salam, “Weak and Electromagnetic Interactions,” Conf. Proc. C 680519, 367–377 (1968)

  18. [18]

    Deep Learning and its Application to LHC Physics

    Dan Guest, Kyle Cranmer, and Daniel Whiteson, “Deep Learning and its Application to LHC Physics,” Ann. Rev. Nucl. Part. Sci. 68, 161–181 (2018), arXiv:1806.11484 [hep-ex]

  19. [19]

    Jet-Images -- Deep Learning Edition

    Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, and Ariel Schwartzman, “Jet-images — deep learning edition,” JHEP 07, 069 (2016), arXiv:1511.05190 [hep-ph]

  20. [20]

    The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations

    J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations,” JHEP 07, 079 (2014), arXiv:1405.0301 [hep-ph]

  21. [21]

    FeynRules 2.0 - A complete toolbox for tree-level phenomenology

    Adam Alloul, Neil D. Christensen, C´ eline Degrande, Claude Duhr, and Benjamin Fuks, “FeynRules 2.0 - A complete toolbox for tree-level phenomenology,” Comput. Phys. Commun. 185, 2250–2300 (2014), arXiv:1310.1921 [hep-ph]

  22. [22]

    An Introduction to PYTHIA 8.2

    Torbj¨ orn Sj¨ ostrand, Stefan Ask, Jesper R. Christiansen, Richard Corke, Nishita Desai, Philip Ilten, Stephen Mrenna, Stefan Prestel, Christine O. Rasmussen, and Peter Z. Skands, “An introduction to PYTHIA 8.2,” Comput. Phys. Commun. 191, 159–177 (2015), arXiv:1410.3012 [hep-ph]

  23. [23]

    DELPHES 3, A modular framework for fast simulation of a generic collider experiment

    J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaˆ ıtre, A. Mertens, and M. Selvaggi (DELPHES 3), “DELPHES 3, A modular framework for fast simulation of a generic collider experiment,” JHEP 02, 057 (2014), arXiv:1307.6346 [hep-ex]

  24. [24]

    MadAnalysis 5, a user-friendly framework for collider phenomenology

    Eric Conte, Benjamin Fuks, and Guillaume Serret, “MadAnalysis 5, A User-Friendly Framework for Collider Phenomenol- ogy,” Comput. Phys. Commun. 184, 222–256 (2013), arXiv:1206.1599 [hep-ph]

  25. [25]

    Approximating Likelihood Ratios with Calibrated Discriminative Classifiers

    Kyle Cranmer, Juan Pavez, and Gilles Louppe, “Approximating Likelihood Ratios with Calibrated Discriminative Clas- sifiers,” (2015), arXiv:1506.02169 [stat.AP]

  26. [26]

    Parameterized Machine Learning for High-Energy Physics

    Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson, “Parameterized neural networks for high-energy physics,” Eur. Phys. J. C 76, 235 (2016), arXiv:1601.07913 [hep-ex]

  27. [27]

    Interpretable deep learning for two-prong jet classification with jet spectra,

    Amit Chakraborty, Sung Hak Lim, and Mihoko M. Nojiri, “Interpretable deep learning for two-prong jet classification with jet spectra,” JHEP 07, 135 (2019), arXiv:1904.02092 [hep-ph]

  28. [28]

    Mapping Machine-Learned Physics into a Human-Readable Space,

    Taylor Faucett, Jesse Thaler, and Daniel Whiteson, “Mapping Machine-Learned Physics into a Human-Readable Space,” Phys. Rev. D 103, 036020 (2021), arXiv:2010.11998 [hep-ph]. 15

  29. [29]

    Uncertainty-aware machine learning for high energy physics,

    Aishik Ghosh, Benjamin Nachman, and Daniel Whiteson, “Uncertainty-aware machine learning for high energy physics,” Phys. Rev. D 104, 056026 (2021), arXiv:2105.08742 [physics.data-an]

  30. [30]

    Deep- Learning Jets with Uncertainties and More,

    Sven Bollweg, Manuel Haußmann, Gregor Kasieczka, Michel Luchmann, Tilman Plehn, and Jennifer Thompson, “Deep- Learning Jets with Uncertainties and More,” SciPost Phys. 8, 006 (2020), arXiv:1904.10004 [hep-ph]

  31. [31]

    Theory and phenomenology of two-Higgs-doublet models

    G. C. Branco, P. M. Ferreira, L. Lavoura, M. N. Rebelo, Marc Sher, and Joao P. Silva, “Theory and phenomenology of two-Higgs-doublet models,” Phys. Rept. 516, 1–102 (2012), arXiv:1106.0034 [hep-ph]

  32. [32]

    Global fit of the Aligned Two-Higgs-Doublet Model,

    Anirban Karan, V´ ıctor Miralles, and Antonio Pich, “Global fit of the Aligned Two-Higgs-Doublet Model,” in2023 European Physical Society Conference on High Energy Physics (2023) arXiv:2312.00514 [hep-ph]

  33. [33]

    TASI 2013 lectures on Higgs physics within and beyond the Standard Model,

    Heather E. Logan, “TASI 2013 lectures on Higgs physics within and beyond the Standard Model,” (2014), arXiv:1406.1786 [hep-ph]

  34. [34]

    Charged Higgs decay to W ±H at a high energy lepton collider,

    Majid Hashemi and Laleh Roushandel, “Charged Higgs decay to W ±H at a high energy lepton collider,” (2023), arXiv:2310.06519 [hep-ph]

  35. [35]

    Search for charged Higgs bosons decaying via $H^{\pm} \rightarrow \tau^{\pm}\nu$ in fully hadronic final states using $pp$ collision data at $\sqrt{s} = 8$ TeV with the ATLAS detector

    Georges Aad et al. (ATLAS), “Search for charged Higgs bosons decaying via H ± → τ ±ν in fully hadronic final states using pp collision data at √s = 8 TeV with the ATLAS detector,” JHEP 03, 088 (2015), arXiv:1412.6663 [hep-ex]

  36. [36]

    Search for a charged Higgs boson in pp collisions at sqrt(s) = 8 TeV

    Vardan Khachatryan et al. (CMS), “Search for a charged Higgs boson in pp collisions at √s = 8 TeV,” JHEP 11, 018 (2015), arXiv:1508.07774 [hep-ex]

  37. [37]

    Search for charged Higgs bosons in the $H^{\pm} \rightarrow tb$ decay channel in $pp$ collisions at $\sqrt{s} = 8$ TeV using the ATLAS detector

    Georges Aad et al. (ATLAS), “Search for charged Higgs bosons in the H ± → tb decay channel in pp collisions at √s = 8 TeV using the ATLAS detector,” JHEP 03, 127 (2016), arXiv:1512.03704 [hep-ex]

  38. [38]

    Update of Global Two-Higgs-Doublet Model Fits

    Debtosh Chowdhury and Otto Eberhardt, “Update of Global Two-Higgs-Doublet Model Fits,” JHEP 05, 161 (2018), arXiv:1711.02095 [hep-ph]

  39. [39]

    Asymptotic formulae for likelihood-based tests of new physics

    Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer Vitells, “Asymptotic formulae for likelihood-based tests of new physics,” Eur. Phys. J. C 71, 1554 (2011), [Erratum: Eur.Phys.J.C 73, 2501 (2013)], arXiv:1007.1727 [physics.data-an]

  40. [40]

    QBDT, a new boosting decision tree method with systematic uncertainties into training for High Energy Physics

    Li-Gang Xia, “QBDT, a new boosting decision tree method with systematical uncertainties into training for High Energy Physics,” Nucl. Instrum. Meth. A 930, 15–26 (2019), arXiv:1810.08387 [physics.data-an]

  41. [41]

    On a measure of divergence between two multinomial populations,

    A. Bhattacharyya, “On a measure of divergence between two multinomial populations,” Sankhy¯ a: The Indian Journal of Statistics (1933-1960) 7, 401–406 (1946)

  42. [42]

    On information and sufficiency,

    S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics 22, 79–86 (1951)