Saliency Maps Generation for Automatic Text Summarization

Alessandra Russo; David Tuckey; Krysia Broda

arxiv: 1907.05664 · v1 · pith:ORUMHA3Pnew · submitted 2019-07-12 · 💻 cs.LG · cs.CL

Saliency Maps Generation for Automatic Text Summarization

David Tuckey , Krysia Broda , Alessandra Russo This is my paper

Pith reviewed 2026-05-24 22:31 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords saliency mapslayer-wise relevance propagationtext summarizationexplainable AIsequence-to-sequence modelscounterfactual testingattention models

0 comments

The pith

Saliency maps from LRP on text summarization models sometimes fail to match the model's actual use of input features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies Layer-Wise Relevance Propagation to a sequence-to-sequence attention model trained on a text summarization dataset and obtains saliency maps that are sometimes unexpected. It demonstrates through a suggested protocol of checking counterfactual input changes that these maps sometimes capture the real use of the input features by the network and sometimes do not. The authors argue that a quantitative method is needed to test the counterfactual case in order to judge the truthfulness of the saliency maps as explanations. A sympathetic reader would care because the work shows the limits of accepting such maps at face value for complex sequence-to-sequence tasks.

Core claim

The authors apply LRP to a seq2seq attention model on text summarization and find that the resulting saliency maps sometimes reflect the network's actual computation on input features and sometimes do not. They propose a protocol to check validity by testing the effect of altering inputs attributed high importance and conclude that care must be taken when treating the maps as explanations, since a quantitative way of testing the counterfactual case is required to judge their truthfulness.

What carries the argument

Layer-Wise Relevance Propagation (LRP) applied to a sequence-to-sequence attention model, which back-propagates relevance scores to assign importance values to input tokens for explaining the generated summary.

If this is right

Saliency maps for text summarization require explicit counterfactual testing to establish whether they are truthful.
The proposed protocol distinguishes cases where the maps match the model's feature use from cases where they do not.
Explanations based on saliency maps should be accepted only with caution for automatic text summarization.
A quantitative method is necessary to judge the validity of importance attributions produced by LRP in such models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same validation step may be needed for other explainability methods applied to attention-based NLP models.
The protocol could be tested on different summarization datasets to check how often the mismatch occurs.
This example suggests that task complexity increases the chance that saliency maps will diverge from model behavior.

Load-bearing premise

That the protocol of checking the effect of counterfactual changes to inputs can reliably determine whether a saliency map reflects the model's actual computation.

What would settle it

Finding a case in which an input token ranked as highly important by the saliency map is altered yet the model's generated summary remains unchanged would show that the map does not reflect the model's computation.

Figures

Figures reproduced from arXiv: 1907.05664 by Alessandra Russo, David Tuckey, Krysia Broda.

**Figure 2.** Figure 2: Representation of the propagation of the relevance from the output to the input. It passes through the decoder and attention [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Left : Saliency map over the truncated input text for the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Summary from another test text generated after deleting [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: Summary from Figure 1 generated after deleting important [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Saliency map generation techniques are at the forefront of explainable AI literature for a broad range of machine learning applications. Our goal is to question the limits of these approaches on more complex tasks. In this paper we apply Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset. We obtain unexpected saliency maps and discuss the rightfulness of these "explanations". We argue that we need a quantitative way of testing the counterfactual case to judge the truthfulness of the saliency maps. We suggest a protocol to check the validity of the importance attributed to the input and show that the saliency maps obtained sometimes capture the real use of the input features by the network, and sometimes do not. We use this example to discuss how careful we need to be when accepting them as explanation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies LRP to a seq2seq summarizer, reports mixed map faithfulness via a perturbation test, but the test itself is unlikely to isolate direct feature use cleanly.

read the letter

This paper takes LRP, applies it to a sequence-to-sequence attention model on text summarization, and reports that the resulting saliency maps only sometimes reflect the features the model actually relies on. They use this to push for a quantitative protocol that tests the maps against counterfactual input changes. What stands out is the concrete demonstration on a real seq2seq setup rather than toy examples. It does a solid job of documenting the inconsistency and making the case that blind trust in saliency is risky for complex tasks like summarization. The main limitation is that the suggested protocol may not cleanly separate true feature importance from side effects. In attention-based models, altering one input token shifts encoder states and attention weights across the board, so the change in output could come from indirect paths. The paper would be stronger if it included steps to control for that, such as targeted probes or frozen components, or at least quantified how often the maps failed and by how much. The abstract leaves out architecture details, dataset size, and exact results, which makes it tough to judge the strength of the mixed outcome. Assuming the full version fills those in, the work still serves as a useful warning. This is worth bringing to a reading group focused on XAI methods for NLP. Readers working on saliency or explanations for sequence models would get value from the example and the call for better validation. It deserves peer review because the underlying concern about explanation faithfulness is legitimate and the paper engages it directly, even if the protocol needs refinement.

Referee Report

2 major / 1 minor

Summary. The manuscript applies Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained for text summarization. It reports obtaining unexpected saliency maps, argues that a quantitative counterfactual protocol is needed to assess their truthfulness, proposes such a protocol based on input perturbations, and concludes that the resulting maps sometimes capture the model's actual use of input features and sometimes do not.

Significance. If the suggested protocol were shown to reliably isolate feature contributions, the work would usefully caution against over-interpreting saliency maps on attention-based seq2seq models. As presented, however, the absence of model architecture details, dataset description, quantitative metrics, or explicit implementation of the protocol limits the contribution to an observational case study rather than a validated methodological finding.

major comments (2)

[Abstract / protocol description] Abstract and discussion of the protocol: the proposed counterfactual test perturbs input tokens and observes output changes, but does not address the non-local propagation through the encoder hidden states and cross-attention weights. Changing one token alters the entire attention distribution, so any output delta conflates direct feature contribution with indirect effects; without an isolation mechanism (e.g., frozen attention or linear probe), the protocol cannot distinguish faithful saliency from perturbation artifacts.
[Abstract] Abstract: the central claim that the maps 'sometimes capture the real use ... and sometimes do not' is stated without any reported model architecture, training dataset, quantitative evaluation of the protocol, or even the number of examples examined. This renders the mixed-outcome observation impossible to assess or reproduce.

minor comments (1)

The manuscript would benefit from explicit section headings and numbered equations or algorithms when describing the LRP application and the suggested protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. Our work is an observational case study applying LRP to a seq2seq attention model for summarization, showing unexpected saliency maps and proposing a basic counterfactual protocol to check their validity. We address each major comment below.

read point-by-point responses

Referee: [Abstract / protocol description] Abstract and discussion of the protocol: the proposed counterfactual test perturbs input tokens and observes output changes, but does not address the non-local propagation through the encoder hidden states and cross-attention weights. Changing one token alters the entire attention distribution, so any output delta conflates direct feature contribution with indirect effects; without an isolation mechanism (e.g., frozen attention or linear probe), the protocol cannot distinguish faithful saliency from perturbation artifacts.

Authors: We agree that single-token perturbations propagate non-locally through encoder states and cross-attention, so output changes conflate direct and indirect effects. The protocol is presented as a simple, practical counterfactual check to reveal cases where LRP maps diverge from observed model behavior, not as an isolated attribution method. We will revise the discussion section to explicitly note this limitation and mention that extensions could incorporate mechanisms such as frozen attention weights for stronger isolation. revision: partial
Referee: [Abstract] Abstract: the central claim that the maps 'sometimes capture the real use ... and sometimes do not' is stated without any reported model architecture, training dataset, quantitative evaluation of the protocol, or even the number of examples examined. This renders the mixed-outcome observation impossible to assess or reproduce.

Authors: The manuscript body describes the seq2seq attention architecture, the CNN/DailyMail dataset, and the examination of multiple examples that produced the mixed outcomes. The abstract is kept concise per journal norms. We will revise the abstract to include a brief reference to the setup and the number of examples examined, improving reproducibility while preserving the observational nature of the study. revision: yes

Circularity Check

0 steps flagged

No circularity; observational application of existing LRP method

full rationale

The paper applies Layer-Wise Relevance Propagation (an existing technique) to a trained seq2seq attention model on a summarization task and proposes an empirical protocol for checking saliency map validity via input perturbations. No derivations, equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central claim is an observational finding that saliency maps sometimes align and sometimes do not with feature usage, supported by the suggested protocol rather than any tautological reduction to inputs. The work is self-contained as an empirical investigation without circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that the counterfactual protocol can serve as ground truth for saliency-map validity; no free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5670 in / 990 out tokens · 47565 ms · 2026-05-24T22:31:55.512444+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 4 internal anchors

[1]

Peeking Inside the Black-Box: A Survey on Explainable Artiﬁcial Intelligence (XAI)

[Adadi and Berrada, 2018] Amina Adadi and Mohammed Berrada. Peeking Inside the Black-Box: A Survey on Explainable Artiﬁcial Intelligence (XAI). IEEE Access , 6:52138–52160,

work page 2018
[2]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

[Bach et al., 2015] Sebastian Bach, Alexander Binder, Gr´egoire Montavon, Frederick Klauschen, Klaus Robert M¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PLoS ONE, 10(7):1–46,

work page 2015
[3]

Neural Machine Translation by Jointly Learning to Align and Translate

[Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[4]

Teaching Machines to Read and Comprehend

[Hermann et al., 2015] Karl Moritz Hermann, Tom ´aˇs Koˇcisk´y, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching Machines to Read and Comprehend. pages 1–14,

work page 2015
[5]

Long short-term memory

[Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780,

work page 1997
[6]

Explanation in artiﬁcial intelli- gence: Insights from the social sciences

[Miller, 2019] Tim Miller. Explanation in artiﬁcial intelli- gence: Insights from the social sciences. Artiﬁcial Intelli- gence, 267:1–38,

work page 2019
[7]

Methods for inter- preting and understanding deep neural networks

[Montavon et al., 2017] Gr´egoire Montavon, Wojciech Samek, and Klaus-Robert M ¨uller. Methods for inter- preting and understanding deep neural networks. Digital Signal Processing, 73:1–15, feb

work page 2017
[8]

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

[Nallapati et al., 2016] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos Santos, Caglar Gulcehre, and Bing Xiang. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. arXiv preprint arXiv:1602.06023,

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

Radev, Eduard Hovy, and Kathleen McKeown

[Radev et al., 2002] Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown. Introduction to the Special Issue on Summarization. Computational Linguistics , 28(4):399– 408,

work page 2002
[10]

”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classiﬁer

[Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classiﬁer. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135– 1144,

work page 2016
[11]

A Neural Attention Model for Abstractive Sentence Summarization

[Rush et al., 2015] Alexander M Rush, Sumit Chopra, and Jason Weston. A Neural Attention Model for Ab- stractive Sentence Summarization. arXiv preprint arXiv:1509.00685, sep

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Evaluating the visualization of what a deep neural network has learned

[Samek et al., 2017] Wojciech Samek, Alexander Binder, Gr´egoire Montavon, Sebastian Lapuschkin, and Klaus Robert M ¨uller. Evaluating the visualization of what a deep neural network has learned. IEEE Trans- actions on Neural Networks and Learning Systems , 28(11):2660–2673,

work page 2017
[13]

Get To The Point: Summarization with Pointer-Generator Networks

[See et al., 2017] Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer- generator networks. arXiv preprint arXiv:1704.04368 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Peeking Inside the Black-Box: A Survey on Explainable Artiﬁcial Intelligence (XAI)

[Adadi and Berrada, 2018] Amina Adadi and Mohammed Berrada. Peeking Inside the Black-Box: A Survey on Explainable Artiﬁcial Intelligence (XAI). IEEE Access , 6:52138–52160,

work page 2018

[2] [2]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

[Bach et al., 2015] Sebastian Bach, Alexander Binder, Gr´egoire Montavon, Frederick Klauschen, Klaus Robert M¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PLoS ONE, 10(7):1–46,

work page 2015

[3] [3]

Neural Machine Translation by Jointly Learning to Align and Translate

[Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[4] [4]

Teaching Machines to Read and Comprehend

[Hermann et al., 2015] Karl Moritz Hermann, Tom ´aˇs Koˇcisk´y, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching Machines to Read and Comprehend. pages 1–14,

work page 2015

[5] [5]

Long short-term memory

[Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780,

work page 1997

[6] [6]

Explanation in artiﬁcial intelli- gence: Insights from the social sciences

[Miller, 2019] Tim Miller. Explanation in artiﬁcial intelli- gence: Insights from the social sciences. Artiﬁcial Intelli- gence, 267:1–38,

work page 2019

[7] [7]

Methods for inter- preting and understanding deep neural networks

[Montavon et al., 2017] Gr´egoire Montavon, Wojciech Samek, and Klaus-Robert M ¨uller. Methods for inter- preting and understanding deep neural networks. Digital Signal Processing, 73:1–15, feb

work page 2017

[8] [8]

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

[Nallapati et al., 2016] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos Santos, Caglar Gulcehre, and Bing Xiang. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. arXiv preprint arXiv:1602.06023,

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

Radev, Eduard Hovy, and Kathleen McKeown

[Radev et al., 2002] Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown. Introduction to the Special Issue on Summarization. Computational Linguistics , 28(4):399– 408,

work page 2002

[10] [10]

”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classiﬁer

[Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why Should I Trust You?”: Ex- plaining the Predictions of Any Classiﬁer. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135– 1144,

work page 2016

[11] [11]

A Neural Attention Model for Abstractive Sentence Summarization

[Rush et al., 2015] Alexander M Rush, Sumit Chopra, and Jason Weston. A Neural Attention Model for Ab- stractive Sentence Summarization. arXiv preprint arXiv:1509.00685, sep

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Evaluating the visualization of what a deep neural network has learned

[Samek et al., 2017] Wojciech Samek, Alexander Binder, Gr´egoire Montavon, Sebastian Lapuschkin, and Klaus Robert M ¨uller. Evaluating the visualization of what a deep neural network has learned. IEEE Trans- actions on Neural Networks and Learning Systems , 28(11):2660–2673,

work page 2017

[13] [13]

Get To The Point: Summarization with Pointer-Generator Networks

[See et al., 2017] Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer- generator networks. arXiv preprint arXiv:1704.04368 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017