Recognition: unknown
What exactly did the Transformer learn from our physics data?
read the original abstract
Transformer networks excel in scientific applications. We explore two scenarios in ultra-high-energy cosmic ray simulations to examine what these network architectures learn. First, we investigate the trained positional encodings in air showers which are azimuthally symmetric. Second, we visualize the attention values assigned to cosmic particles originating from a galaxy catalog. In both cases, the Transformers learn plausible, physically meaningful features.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Dissecting Jet-Tagger Through Mechanistic Interpretability
A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.