Hits to Higgs: Hit-Level Higgs Classification from Raw LHC Detector Data Using Higgsformer

arxiv: 2508.19190 · v4 · submitted 2025-08-26 · ✦ hep-ph

Hits to Higgs: Hit-Level Higgs Classification from Raw LHC Detector Data Using Higgsformer

Sascha Caron , Polina Moskvitina , Roberto Ruiz de Austri , Eugene Shalugin This is my paper

Pith reviewed 2026-05-18 21:01 UTC · model grok-4.3

classification ✦ hep-ph

keywords Higgs bosonmachine learningtransformerLHCtracker hitsevent classificationttHsimulation

0 comments p. Extension

The pith

A transformer model distinguishes Higgs events directly from raw inner tracker hits, matching traditional reconstruction performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that a transformer architecture originally designed for assigning hits to tracks can be retrained to classify Higgs signal events straight from raw inner tracker hits, skipping the full reconstruction of tracks, jets, and b-tags. The benchmark task is separating ttH from ttbar events where the Higgs decays to bottom quarks, a topology that is hard to distinguish even after standard processing. The large Higgsformer reaches an AUC of 0.855 on simulated data, matching what an object-based Particle Transformer achieves when the latter is run with a b-tagging efficiency of about 40 percent under the same detector conditions. The result holds across different training sizes and pileup levels, indicating that the hit patterns alone carry the necessary discrimination power.

Core claim

Retraining a transformer model for hit-to-track assignment to instead classify events allows distinguishing ttH from ttbar directly from raw hits, reaching an AUC of 0.855 that matches the performance of a Particle Transformer on reconstructed objects at about 40 percent b-tagging efficiency.

What carries the argument

Higgsformer, a transformer model adapted from inner-tracker hit assignment to direct classification on raw detector hits.

If this is right

Event classification becomes possible without the intermediate steps of track finding, vertexing, or jet clustering.
The hit-level approach maintains performance across varying dataset sizes and pileup conditions.
Discrimination power at the raw-hit stage can equal that of object-based methods run at typical b-tagging efficiencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the approach transfers to real data it could reduce the computing resources needed for high-level physics analyses.
The same hit-level strategy might be tested on other rare processes or detector subsystems beyond the inner tracker.
Combining raw hits with limited additional information could be explored as a way to improve discrimination without full reconstruction.

Load-bearing premise

The fast simulation of inner-tracker hits produces patterns representative enough of real data for the trained model to perform similarly on actual collisions.

What would settle it

Measuring the AUC on real LHC collision data and finding it significantly below 0.855 would show that the simulation does not capture the necessary features.

read the original abstract

We present Higgsformer, a transformer-based architecture that classifies Higgs events at the Large Hadron Collider directly from raw inner tracker hits, bypassing the traditional reconstruction chain of intermediate physics objects. As a benchmark, we focus on distinguishing $t\bar{t}H$ from $t\bar{t}$ events with $H \to b\bar{b}$, a particularly challenging task due to their similar final state topologies. Our pipeline begins with event generation in Pythia8, fast simulation with ACTS/Fatras, and classification directly from raw detector hits. We show for the first time that a transformer model originally developed for inner tracker hit-to-track assignment can be retrained to classify Higgs signal events directly from raw hits. For comparison, we reconstruct the same events with Delphes and train a Particle Transformer as an object-based classifier. We evaluate both approaches under varying dataset sizes and pileup levels. Despite relying exclusively on inner tracker hits, our large Higgsformer achieves an AUC of $0.855$, matching the performance of the traditional reconstruction pipeline at a $b$-tagging efficiency of $\approx 40\%$ under the same detector constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Higgsformer shows a hit-to-track transformer can be retrained for ttH vs tt classification straight from raw inner-tracker hits at AUC 0.855, matching a Delphes baseline at 40% b-tagging, but the result sits entirely on fast ACTS/Fatras simulation.

read the letter

The main thing to know is that this group took an existing hit-to-track transformer and retrained it to separate ttH from tt events using only raw inner-tracker hits, reporting an AUC of 0.855 that reaches parity with a Particle Transformer run on Delphes-reconstructed objects at roughly 40% b-tagging efficiency under the same constraints. They also check performance across dataset sizes and pileup levels, which gives a practical sense of scaling behavior. That direct from-hits comparison is the clearest new element here relative to prior transformer work in HEP. The setup is straightforward and the benchmark is concrete, so the paper does a decent job documenting that the architecture can be repurposed without obvious collapse on this topology. The central claim holds up inside the simulation they used. The soft spot is the exclusive use of Pythia8 plus ACTS/Fatras for hit generation. Fatras is a parametric fast simulator, and any mismatch in multiple scattering, energy loss, or hit resolution compared with full Geant4 could change the patterns the model attends to, particularly for b-jet discrimination. The abstract gives no cross-validation against higher-fidelity simulation or real data, and the single AUC number lacks error bars or detailed validation splits. Those gaps make the transfer to actual LHC collisions an open question rather than a settled result. This paper is for people working on end-to-end ML models for LHC reconstruction who want to see whether skipping intermediate objects is feasible on a hard final state. A reader already following transformer applications in tracking or jet tagging would pick up usable architecture and training details. The idea is novel enough and the comparison direct enough to deserve a serious referee, even though the simulation fidelity section will need strengthening before publication.

Referee Report

2 major / 1 minor

Summary. The paper introduces Higgsformer, a transformer architecture adapted from hit-to-track assignment tasks, to classify ttH (with H→bb) versus tt events directly from raw inner-tracker hits. Events are generated with Pythia8 and simulated with ACTS/Fatras; the model is compared to a Particle Transformer trained on Delphes-reconstructed objects. The central quantitative result is an AUC of 0.855 for the large Higgsformer, stated to match the object-based pipeline at ~40% b-tagging efficiency under identical detector constraints, with evaluations performed at varying dataset sizes and pileup levels.

Significance. If the performance generalizes beyond the fast-simulation setup, the result would demonstrate that end-to-end hit-level classification can reach parity with traditional reconstruction pipelines for a challenging final state. This would be a notable technical contribution in the direction of bypassing intermediate physics objects. The adaptation of an existing hit-to-track transformer and the direct comparison to a Delphes-based baseline are strengths; however, the absence of statistical uncertainties and simulation-fidelity checks limits the immediate impact.

major comments (2)

[Results] Results section: the reported AUC of 0.855 is given as a single point estimate with no error bars, no dataset cardinality, and no explicit statement of the pileup multiplicity or b-tagging working point used for the final comparison. These omissions make it impossible to judge whether the quoted parity with the Delphes Particle Transformer is statistically meaningful or sensitive to the precise evaluation conditions.
[Simulation and Methods] Simulation and detector modeling: the entire training and evaluation chain uses only ACTS/Fatras fast parametric simulation of inner-tracker hits. No cross-check against full Geant4 digitization, no quantification of differences in hit multiplicity, resolution, or multiple-scattering tails, and no discussion of how such discrepancies would propagate into the transformer’s attention patterns are provided. This is load-bearing for the claim that the model operates “directly from raw inner tracker hits” in a manner relevant to real LHC data.

minor comments (1)

[Abstract] Abstract: the statement that performance was evaluated “under varying dataset sizes and pileup levels” is not accompanied by the actual ranges or the specific configuration that yields the quoted AUC of 0.855.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments, which help clarify the presentation of our results. We address each major point below and outline the corresponding revisions.

read point-by-point responses

Referee: [Results] Results section: the reported AUC of 0.855 is given as a single point estimate with no error bars, no dataset cardinality, and no explicit statement of the pileup multiplicity or b-tagging working point used for the final comparison. These omissions make it impossible to judge whether the quoted parity with the Delphes Particle Transformer is statistically meaningful or sensitive to the precise evaluation conditions.

Authors: We agree that the presentation of the central AUC result would benefit from additional statistical detail and explicit evaluation parameters. In the revised manuscript we will report bootstrapped uncertainties on all AUC values, state the exact training and test dataset cardinalities used for the quoted 0.855 result, and specify the pileup multiplicity together with the precise b-tagging efficiency working point at which the Delphes Particle Transformer comparison is performed. These additions will make the statistical significance of the observed parity transparent. revision: yes
Referee: [Simulation and Methods] Simulation and detector modeling: the entire training and evaluation chain uses only ACTS/Fatras fast parametric simulation of inner-tracker hits. No cross-check against full Geant4 digitization, no quantification of differences in hit multiplicity, resolution, or multiple-scattering tails, and no discussion of how such discrepancies would propagate into the transformer’s attention patterns are provided. This is load-bearing for the claim that the model operates “directly from raw inner tracker hits” in a manner relevant to real LHC data.

Authors: We acknowledge that reliance on fast parametric simulation constitutes a genuine limitation for immediate claims about real LHC data. The manuscript is framed as a proof-of-concept study within a controlled simulation environment, which is standard for algorithmic development. In the revision we will add a dedicated limitations subsection that (i) quantifies typical differences in hit multiplicity and resolution between Fatras and full Geant4 for the inner tracker, (ii) discusses how these differences could affect attention patterns, and (iii) clarifies that the reported performance is specific to the fast-simulation setup. We will also outline the steps required for a future full-simulation validation. This keeps the scope of the current work accurate while addressing the referee’s concern. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical AUC from independent training/evaluation on simulated hits

full rationale

The paper reports an empirical performance metric (AUC 0.855) obtained by training and testing Higgsformer on Pythia8+ACTS/Fatras events and comparing it to a separately reconstructed Delphes+Particle Transformer baseline. No equations, fitted parameters, or self-citations reduce this result to an input by construction. The central claim is a direct measurement under stated simulation assumptions rather than a derivation that collapses to its own premises. Minor hyperparameter choices on the same samples do not meet the load-bearing self-definition or fitted-prediction criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the realism of fast simulation and on the assumption that raw-hit patterns contain sufficient information for the classification task. No new physical entities are introduced.

free parameters (1)

Transformer hyperparameters and training schedule
Standard neural-network hyperparameters that are tuned to achieve the reported AUC.

axioms (1)

domain assumption ACTS/Fatras fast simulation produces hit-level data whose statistical properties are close enough to real detector response for the classification task.
The entire pipeline is built on this simulation; any mismatch would invalidate the performance claim.

pith-pipeline@v0.9.0 · 5742 in / 1329 out tokens · 40549 ms · 2026-05-18T21:01:55.987606+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show for the first time that a transformer model originally developed for inner tracker hit-to-track assignment can be retrained to classify Higgs events directly from raw hits... Higgsformer achieves an AUC of 0.792.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Higgsformer-small (only inner tracker) Raw detector hits

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 4 internal anchors

[1]

ATLAS collaboration, Measurement of the associated production of a top-antitop-quark pair and a Higgs boson decaying into a b¯b pair in pp collisions at √s = 13 TeV using the ATLAS detector at the LHC, Eur. Phys. J. C 85 (2025) 210 [ 2407.10904]

work page arXiv 2025
[2]

CMS collaboration, Measurement of the ttH and tH production rates in the H →bb decay channel using proton-proton collision data at √s = 13 TeV, JHEP 02 (2025) 097 [ 2407.10896]

work page arXiv 2025
[3]

An Introduction to PYTHIA 8.2

T. Sj¨ ostrand, S. Ask, J.R. Christiansen, R. Corke, N. Desai, P. Ilten et al., An introduction to PYTHIA 8.2, Comput. Phys. Commun. 191 (2015) 159 [ 1410.3012]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[4]

Spannagel et al., The A Common Tracking Software project, SoftwareX 11 (2020) 100464 [1909.06194]

S. Spannagel et al., The A Common Tracking Software project, SoftwareX 11 (2020) 100464 [1909.06194]

work page arXiv 2020
[5]

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

T. Dao, FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, arXiv preprint arXiv:2307.08691 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Caron, N

S. Caron, N. Dobreva, A. Ferrer S´ anchez, J.D. Mart´ ın-Guerrero, U. Odyurt, R. Ruiz de Austri Bazan et al., Trackformers: in search of transformer-based particle tracking for the high-luminosity LHC era, The European Physical Journal C 85 (2025) 460

work page 2025
[7]

DELPHES 3, A modular framework for fast simulation of a generic collider experiment

J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaˆ ıtre, A. Mertens et al., DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057 [ 1307.6346]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[8]

H. Qu, C. Li and S. Qian, Particle Transformer for Jet Tagging, 2202.03772

work page arXiv
[9]

Aaboud, Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector, Physics Letters B 784 (2018) 173–191

M.e.a. Aaboud, Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector, Physics Letters B 784 (2018) 173–191

work page 2018
[10]

Sirunyan, Observation of t¯tH production, Physical Review Letters 120 (2018)

A.e.a. Sirunyan, Observation of t¯tH production, Physical Review Letters 120 (2018)

work page 2018
[11]

V˚ age,Performance and training bias of BDT vs NN in ttH(bb) search at ATLAS, Sept., 2018

L.H. V˚ age,Performance and training bias of BDT vs NN in ttH(bb) search at ATLAS, Sept., 2018

work page 2018
[12]

Santos, M

R. Santos, M. Nguyen, J. Webster, S. Ryu, J. Adelman, S. Chekanov et al., Machine learning techniques in searches for tth in the h→ bb decay channel, Journal of Instrumentation 12 (2017) P04014

work page 2017
[13]

Kiehn, S

M. Kiehn, S. Amrouche, P. Calafiura, V. Estrade, S. Farrell, C. Germain et al., The TrackML high-energy physics tracking challenge on Kaggle, EPJ Web Conf. 214 (2019) 06037

work page 2019
[14]

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

T. Dao, D.Y. Fu, S. Ermon, A. Rudra and C. R´ e, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, 2205.14135

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Rumelhart, G.E

D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning representations by back-propagating errors, Nature 323 (1986) 533

work page 1986
[16]

Builtjes, S

L. Builtjes, S. Caron, P. Moskvitina, C. Nellist, R.R. de Austri, R. Verheyen et al., Attention to the strengths of physical interactions: Transformer and graph-based event classification for particle physics experiments, SciPost Phys. 19 (2025) 028 [ 2211.05143]. – 10 –

work page arXiv 2025

[1] [1]

ATLAS collaboration, Measurement of the associated production of a top-antitop-quark pair and a Higgs boson decaying into a b¯b pair in pp collisions at √s = 13 TeV using the ATLAS detector at the LHC, Eur. Phys. J. C 85 (2025) 210 [ 2407.10904]

work page arXiv 2025

[2] [2]

CMS collaboration, Measurement of the ttH and tH production rates in the H →bb decay channel using proton-proton collision data at √s = 13 TeV, JHEP 02 (2025) 097 [ 2407.10896]

work page arXiv 2025

[3] [3]

An Introduction to PYTHIA 8.2

T. Sj¨ ostrand, S. Ask, J.R. Christiansen, R. Corke, N. Desai, P. Ilten et al., An introduction to PYTHIA 8.2, Comput. Phys. Commun. 191 (2015) 159 [ 1410.3012]

work page internal anchor Pith review Pith/arXiv arXiv 2015

[4] [4]

Spannagel et al., The A Common Tracking Software project, SoftwareX 11 (2020) 100464 [1909.06194]

S. Spannagel et al., The A Common Tracking Software project, SoftwareX 11 (2020) 100464 [1909.06194]

work page arXiv 2020

[5] [5]

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

T. Dao, FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, arXiv preprint arXiv:2307.08691 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Caron, N

S. Caron, N. Dobreva, A. Ferrer S´ anchez, J.D. Mart´ ın-Guerrero, U. Odyurt, R. Ruiz de Austri Bazan et al., Trackformers: in search of transformer-based particle tracking for the high-luminosity LHC era, The European Physical Journal C 85 (2025) 460

work page 2025

[7] [7]

DELPHES 3, A modular framework for fast simulation of a generic collider experiment

J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaˆ ıtre, A. Mertens et al., DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057 [ 1307.6346]

work page internal anchor Pith review Pith/arXiv arXiv 2014

[8] [8]

H. Qu, C. Li and S. Qian, Particle Transformer for Jet Tagging, 2202.03772

work page arXiv

[9] [9]

Aaboud, Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector, Physics Letters B 784 (2018) 173–191

M.e.a. Aaboud, Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector, Physics Letters B 784 (2018) 173–191

work page 2018

[10] [10]

Sirunyan, Observation of t¯tH production, Physical Review Letters 120 (2018)

A.e.a. Sirunyan, Observation of t¯tH production, Physical Review Letters 120 (2018)

work page 2018

[11] [11]

V˚ age,Performance and training bias of BDT vs NN in ttH(bb) search at ATLAS, Sept., 2018

L.H. V˚ age,Performance and training bias of BDT vs NN in ttH(bb) search at ATLAS, Sept., 2018

work page 2018

[12] [12]

Santos, M

R. Santos, M. Nguyen, J. Webster, S. Ryu, J. Adelman, S. Chekanov et al., Machine learning techniques in searches for tth in the h→ bb decay channel, Journal of Instrumentation 12 (2017) P04014

work page 2017

[13] [13]

Kiehn, S

M. Kiehn, S. Amrouche, P. Calafiura, V. Estrade, S. Farrell, C. Germain et al., The TrackML high-energy physics tracking challenge on Kaggle, EPJ Web Conf. 214 (2019) 06037

work page 2019

[14] [14]

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

T. Dao, D.Y. Fu, S. Ermon, A. Rudra and C. R´ e, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, 2205.14135

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Rumelhart, G.E

D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning representations by back-propagating errors, Nature 323 (1986) 533

work page 1986

[16] [16]

Builtjes, S

L. Builtjes, S. Caron, P. Moskvitina, C. Nellist, R.R. de Austri, R. Verheyen et al., Attention to the strengths of physical interactions: Transformer and graph-based event classification for particle physics experiments, SciPost Phys. 19 (2025) 028 [ 2211.05143]. – 10 –

work page arXiv 2025