hub Mixed citations

Attention is not Explanation

· 2019 · cs.CL · arXiv 1902.10186

Mixed citation behavior. Most common role is background (62%).

19 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 19 citing papers arXiv PDF

abstract

Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations' for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. Code for all experiments is available at https://github.com/successar/AttentionExplanation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8

citation-polarity summary

background 5 support 3

representative citing papers

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

cs.LG · 2026-05-07 · accept · novelty 7.0

Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

cs.CL · 2026-02-18 · unverdicted · novelty 6.0

CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.

EviSnap: Faithful Evidence-Cited Explanations for Cold-Start Cross-Domain Recommendation

cs.IR · 2026-01-09 · unverdicted · novelty 6.0

EviSnap creates cross-domain recommendations whose scores decompose exactly into evidence-cited concept contributions via offline LLM facet extraction, clustering, and linear transfer.

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

cs.SE · 2025-03-21 · unverdicted · novelty 6.0

CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.

Large Vision-Language Models Get Lost in Attention

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.

Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.

INSIGHTS: Demonstration-Based Summaries of Time Series Predictors

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

INSIGHTS creates manageable global summaries of time series model behavior by balancing sample importance and diversity with domain-specific utility functions, validated via experiments and user studies.

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

cs.CL · 2019-07-01 · unverdicted · novelty 5.0

Analysis of transformer attention heads in abstractive summarization shows specialization in some heads and proposes a method to measure model reliance on learned attention distributions.

Interpretable Question Answering on Knowledge Bases and Text

cs.CL · 2019-06-26 · unverdicted · novelty 5.0

Compares LIME, input perturbation and attention for explaining QA on KB+text; proposes automatic evaluation paradigm and finds input perturbation superior in both automatic and human studies.

SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.

Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs

cs.CL · 2026-04-14 · unverdicted · novelty 5.0

HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.

Uncertainty-Aware Transformers: Conformal Prediction for Language Models

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.

Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

cs.LG · 2025-08-06 · unverdicted · novelty 3.0

A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality-specific considerations.

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

cs.CL · 2026-05-19

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

cs.LG · 2026-05-14 · 2 refs

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

cs.CY · 2026-02-27 · 2 refs

citing papers explorer

Showing 19 of 19 citing papers.

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters cs.LG · 2026-05-07 · accept · none · ref 297
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity cs.LG · 2026-05-13 · unverdicted · none · ref 21 · internal anchor
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 39 · internal anchor
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models cs.CL · 2026-02-18 · unverdicted · none · ref 12 · internal anchor
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
EviSnap: Faithful Evidence-Cited Explanations for Cold-Start Cross-Domain Recommendation cs.IR · 2026-01-09 · unverdicted · none · ref 1 · internal anchor
EviSnap creates cross-domain recommendations whose scores decompose exactly into evidence-cited concept contributions via offline LLM facet extraction, clustering, and linear transfer.
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation cs.SE · 2025-03-21 · unverdicted · none · ref 26 · internal anchor
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.
Large Vision-Language Models Get Lost in Attention cs.AI · 2026-05-07 · unverdicted · none · ref 12
In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety cs.CL · 2026-04-20 · unverdicted · none · ref 17
Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction cs.LG · 2026-04-03 · unverdicted · none · ref 20
Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.
INSIGHTS: Demonstration-Based Summaries of Time Series Predictors cs.LG · 2026-05-13 · unverdicted · none · ref 15 · internal anchor
INSIGHTS creates manageable global summaries of time series model behavior by balancing sample importance and diversity with domain-specific utility functions, validated via experiments and user studies.
Do Transformer Attention Heads Provide Transparency in Abstractive Summarization? cs.CL · 2019-07-01 · unverdicted · none · ref 9 · internal anchor
Analysis of transformer attention heads in abstractive summarization shows specialization in some heads and proposes a method to measure model reliance on learned attention distributions.
Interpretable Question Answering on Knowledge Bases and Text cs.CL · 2019-06-26 · unverdicted · none · ref 9 · internal anchor
Compares LIME, input perturbation and attention for explaining QA on KB+text; proposes automatic evaluation paradigm and finds input perturbation superior in both automatic and human studies.
SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT cs.CV · 2026-05-04 · unverdicted · none · ref 39
SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs cs.CL · 2026-04-14 · unverdicted · none · ref 17
HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.
Uncertainty-Aware Transformers: Conformal Prediction for Language Models cs.LG · 2026-04-10 · unverdicted · none · ref 7
CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.
Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models cs.LG · 2025-08-06 · unverdicted · none · ref 100 · internal anchor
A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality-specific considerations.
CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models cs.CL · 2026-05-19 · unreviewed · ref 11 · internal anchor
Architecture-Aware Explanation Auditing for Industrial Visual Inspection cs.LG · 2026-05-14 · unreviewed · ref 9 · 2 links · internal anchor
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions cs.CY · 2026-02-27 · unreviewed · ref 182 · 2 links · internal anchor

Attention is not Explanation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer