pith. sign in

hub Canonical reference

Analyzing the structure of attention in a transformer language model

Canonical reference. 80% of citing Pith papers cite this work as background.

13 Pith papers citing it
Background 80% of classified citations
abstract

The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.

hub tools

citation-role summary

background 5

citation-polarity summary

verdicts

UNVERDICTED 13

roles

background 5

polarities

background 4 support 1

representative citing papers

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

Why Retrieval-Augmented Generation Fails: A Graph Perspective

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Attribution graphs reveal that RAG failures arise from shallow fragmented evidence flow in LLMs, enabling topology-based detection and targeted interventions that reinforce question-guided routing.

citing papers explorer

Showing 13 of 13 citing papers.