Is Attention Interpretable?

Noah A. Smith; Sofia Serrano

arxiv: 1906.03731 · v1 · pith:OPGYAPR5new · submitted 2019-06-09 · 💻 cs.CL

Is Attention Interpretable?

Sofia Serrano , Noah A. Smith This is my paper

classification 💻 cs.CL

keywords attentionweightscomponentsinputmodelmodelspredictionsways

0 comments

read the original abstract

Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components' representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, we also find many ways in which this does not hold, i.e., where gradient-based rankings of attention weights better predict their effects than their magnitudes. We conclude that while attention noisily predicts input components' overall importance to a model, it is by no means a fail-safe indicator.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
cs.LG 2026-05 accept novelty 7.0

Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
Large Vision-Language Models Get Lost in Attention
cs.AI 2026-05 unverdicted novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
cs.CL 2026-02 unverdicted novelty 6.0

CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
cs.CL 2025-08 unverdicted novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation
cs.SE 2025-03 unverdicted novelty 6.0

CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-p...
Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models
cs.LG 2025-08 unverdicted novelty 3.0

A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality...