Saliency Maps Generation for Automatic Text Summarization
Pith reviewed 2026-05-24 22:31 UTC · model grok-4.3
The pith
Saliency maps from LRP on text summarization models sometimes fail to match the model's actual use of input features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors apply LRP to a seq2seq attention model on text summarization and find that the resulting saliency maps sometimes reflect the network's actual computation on input features and sometimes do not. They propose a protocol to check validity by testing the effect of altering inputs attributed high importance and conclude that care must be taken when treating the maps as explanations, since a quantitative way of testing the counterfactual case is required to judge their truthfulness.
What carries the argument
Layer-Wise Relevance Propagation (LRP) applied to a sequence-to-sequence attention model, which back-propagates relevance scores to assign importance values to input tokens for explaining the generated summary.
If this is right
- Saliency maps for text summarization require explicit counterfactual testing to establish whether they are truthful.
- The proposed protocol distinguishes cases where the maps match the model's feature use from cases where they do not.
- Explanations based on saliency maps should be accepted only with caution for automatic text summarization.
- A quantitative method is necessary to judge the validity of importance attributions produced by LRP in such models.
Where Pith is reading between the lines
- The same validation step may be needed for other explainability methods applied to attention-based NLP models.
- The protocol could be tested on different summarization datasets to check how often the mismatch occurs.
- This example suggests that task complexity increases the chance that saliency maps will diverge from model behavior.
Load-bearing premise
That the protocol of checking the effect of counterfactual changes to inputs can reliably determine whether a saliency map reflects the model's actual computation.
What would settle it
Finding a case in which an input token ranked as highly important by the saliency map is altered yet the model's generated summary remains unchanged would show that the map does not reflect the model's computation.
Figures
read the original abstract
Saliency map generation techniques are at the forefront of explainable AI literature for a broad range of machine learning applications. Our goal is to question the limits of these approaches on more complex tasks. In this paper we apply Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset. We obtain unexpected saliency maps and discuss the rightfulness of these "explanations". We argue that we need a quantitative way of testing the counterfactual case to judge the truthfulness of the saliency maps. We suggest a protocol to check the validity of the importance attributed to the input and show that the saliency maps obtained sometimes capture the real use of the input features by the network, and sometimes do not. We use this example to discuss how careful we need to be when accepting them as explanation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained for text summarization. It reports obtaining unexpected saliency maps, argues that a quantitative counterfactual protocol is needed to assess their truthfulness, proposes such a protocol based on input perturbations, and concludes that the resulting maps sometimes capture the model's actual use of input features and sometimes do not.
Significance. If the suggested protocol were shown to reliably isolate feature contributions, the work would usefully caution against over-interpreting saliency maps on attention-based seq2seq models. As presented, however, the absence of model architecture details, dataset description, quantitative metrics, or explicit implementation of the protocol limits the contribution to an observational case study rather than a validated methodological finding.
major comments (2)
- [Abstract / protocol description] Abstract and discussion of the protocol: the proposed counterfactual test perturbs input tokens and observes output changes, but does not address the non-local propagation through the encoder hidden states and cross-attention weights. Changing one token alters the entire attention distribution, so any output delta conflates direct feature contribution with indirect effects; without an isolation mechanism (e.g., frozen attention or linear probe), the protocol cannot distinguish faithful saliency from perturbation artifacts.
- [Abstract] Abstract: the central claim that the maps 'sometimes capture the real use ... and sometimes do not' is stated without any reported model architecture, training dataset, quantitative evaluation of the protocol, or even the number of examples examined. This renders the mixed-outcome observation impossible to assess or reproduce.
minor comments (1)
- The manuscript would benefit from explicit section headings and numbered equations or algorithms when describing the LRP application and the suggested protocol.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. Our work is an observational case study applying LRP to a seq2seq attention model for summarization, showing unexpected saliency maps and proposing a basic counterfactual protocol to check their validity. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract / protocol description] Abstract and discussion of the protocol: the proposed counterfactual test perturbs input tokens and observes output changes, but does not address the non-local propagation through the encoder hidden states and cross-attention weights. Changing one token alters the entire attention distribution, so any output delta conflates direct feature contribution with indirect effects; without an isolation mechanism (e.g., frozen attention or linear probe), the protocol cannot distinguish faithful saliency from perturbation artifacts.
Authors: We agree that single-token perturbations propagate non-locally through encoder states and cross-attention, so output changes conflate direct and indirect effects. The protocol is presented as a simple, practical counterfactual check to reveal cases where LRP maps diverge from observed model behavior, not as an isolated attribution method. We will revise the discussion section to explicitly note this limitation and mention that extensions could incorporate mechanisms such as frozen attention weights for stronger isolation. revision: partial
-
Referee: [Abstract] Abstract: the central claim that the maps 'sometimes capture the real use ... and sometimes do not' is stated without any reported model architecture, training dataset, quantitative evaluation of the protocol, or even the number of examples examined. This renders the mixed-outcome observation impossible to assess or reproduce.
Authors: The manuscript body describes the seq2seq attention architecture, the CNN/DailyMail dataset, and the examination of multiple examples that produced the mixed outcomes. The abstract is kept concise per journal norms. We will revise the abstract to include a brief reference to the setup and the number of examples examined, improving reproducibility while preserving the observational nature of the study. revision: yes
Circularity Check
No circularity; observational application of existing LRP method
full rationale
The paper applies Layer-Wise Relevance Propagation (an existing technique) to a trained seq2seq attention model on a summarization task and proposes an empirical protocol for checking saliency map validity via input perturbations. No derivations, equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central claim is an observational finding that saliency maps sometimes align and sometimes do not with feature usage, supported by the suggested protocol rather than any tautological reduction to inputs. The work is self-contained as an empirical investigation without circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
[Adadi and Berrada, 2018] Amina Adadi and Mohammed Berrada. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access , 6:52138–52160,
work page 2018
-
[2]
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
[Bach et al., 2015] Sebastian Bach, Alexander Binder, Gr´egoire Montavon, Frederick Klauschen, Klaus Robert M¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10(7):1–46,
work page 2015
-
[3]
Neural Machine Translation by Jointly Learning to Align and Translate
[Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473,
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[4]
Teaching Machines to Read and Comprehend
[Hermann et al., 2015] Karl Moritz Hermann, Tom ´aˇs Koˇcisk´y, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching Machines to Read and Comprehend. pages 1–14,
work page 2015
-
[5]
[Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780,
work page 1997
-
[6]
Explanation in artificial intelli- gence: Insights from the social sciences
[Miller, 2019] Tim Miller. Explanation in artificial intelli- gence: Insights from the social sciences. Artificial Intelli- gence, 267:1–38,
work page 2019
-
[7]
Methods for inter- preting and understanding deep neural networks
[Montavon et al., 2017] Gr´egoire Montavon, Wojciech Samek, and Klaus-Robert M ¨uller. Methods for inter- preting and understanding deep neural networks. Digital Signal Processing, 73:1–15, feb
work page 2017
-
[8]
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
[Nallapati et al., 2016] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos Santos, Caglar Gulcehre, and Bing Xiang. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. arXiv preprint arXiv:1602.06023,
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[9]
Radev, Eduard Hovy, and Kathleen McKeown
[Radev et al., 2002] Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown. Introduction to the Special Issue on Summarization. Computational Linguistics , 28(4):399– 408,
work page 2002
-
[10]
”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classifier
[Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135– 1144,
work page 2016
-
[11]
A Neural Attention Model for Abstractive Sentence Summarization
[Rush et al., 2015] Alexander M Rush, Sumit Chopra, and Jason Weston. A Neural Attention Model for Ab- stractive Sentence Summarization. arXiv preprint arXiv:1509.00685, sep
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[12]
Evaluating the visualization of what a deep neural network has learned
[Samek et al., 2017] Wojciech Samek, Alexander Binder, Gr´egoire Montavon, Sebastian Lapuschkin, and Klaus Robert M ¨uller. Evaluating the visualization of what a deep neural network has learned. IEEE Trans- actions on Neural Networks and Learning Systems , 28(11):2660–2673,
work page 2017
-
[13]
Get To The Point: Summarization with Pointer-Generator Networks
[See et al., 2017] Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer- generator networks. arXiv preprint arXiv:1704.04368 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.