A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.
What exactly did the Transformer learn from our physics data?
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Transformer networks excel in scientific applications. We explore two scenarios in ultra-high-energy cosmic ray simulations to examine what these network architectures learn. First, we investigate the trained positional encodings in air showers which are azimuthally symmetric. Second, we visualize the attention values assigned to cosmic particles originating from a galaxy catalog. In both cases, the Transformers learn plausible, physically meaningful features.
citation-role summary
citation-polarity summary
fields
hep-ph 1years
2026 1verdicts
ACCEPT 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Dissecting Jet-Tagger Through Mechanistic Interpretability
A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.