A study on the Interpretability of Neural Retrieval Models using DeepSHAP

Avishek Anand; Jaspreet Singh; Zeon Trevor Fernando

arxiv: 1907.06484 · v1 · pith:KKIHZSCCnew · submitted 2019-07-15 · 💻 cs.IR · cs.LG

A study on the Interpretability of Neural Retrieval Models using DeepSHAP

Zeon Trevor Fernando , Jaspreet Singh , Avishek Anand This is my paper

Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords interpretabilityneural retrieval modelsDeepSHAPLIMEexplanationsinformation retrievalShapley values

0 comments

The pith

Explanations from DeepSHAP for neural retrieval models differ considerably from LIME outputs, raising concerns about their robustness and accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to adapt DeepSHAP, an explanation technique that estimates feature importance by comparing network activations to a reference input, for use with neural retrieval models. It tests different ways to build reference documents as the baseline input and then compares the resulting explanations against those produced by LIME on the same models. The comparisons show large differences between the two methods. A sympathetic reader would care because reliable explanations are needed to understand why a neural model judges a document relevant to a query, yet these differences suggest current techniques may not deliver consistent or trustworthy insights.

Core claim

By exploring various reference input document construction techniques for DeepSHAP on neural retrieval models and comparing the generated explanations to LIME, the explanations differ considerably. This raises concerns regarding the robustness and accuracy of explanations produced for NRMs.

What carries the argument

DeepSHAP adapted via reference input document construction techniques to estimate relative importance of input features for neural retrieval model decisions.

If this is right

Direct application of image-classification explanation methods to text retrieval requires careful choice of reference inputs.
Interpretability techniques for neural retrieval models may need domain-specific adaptations instead of off-the-shelf transfer.
Current explanation methods leave open questions about which features truly drive relevance judgments in neural models.
Future work should target more reliable ways to explain document relevance decisions in neural retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners using these explanations to debug or audit search systems should treat the outputs as provisional until consistency improves.
Similar discrepancies could appear when applying explanation methods to other ranking or recommendation models trained on text.
One way to test the concern would be to measure how often each method's highlighted terms align with explicit human relevance judgments on the same queries.

Load-bearing premise

Substantial differences between DeepSHAP and LIME outputs on the same neural retrieval model indicate a lack of robustness or accuracy rather than simply reflecting the distinct assumptions each method makes about feature importance.

What would settle it

A controlled test on synthetic retrieval data where ground-truth important terms are known in advance, showing whether DeepSHAP and LIME both recover those terms accurately despite producing different explanations on real data.

Figures

Figures reproduced from arXiv: 1907.06484 by Avishek Anand, Jaspreet Singh, Zeon Trevor Fernando.

**Figure 1.** Figure 1: Confusion matrices of various DeepSHAP background document methods comparing Jaccard similarities. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

A recent trend in IR has been the usage of neural networks to learn retrieval models for text based adhoc search. While various approaches and architectures have yielded significantly better performance than traditional retrieval models such as BM25, it is still difficult to understand exactly why a document is relevant to a query. In the ML community several approaches for explaining decisions made by deep neural networks have been proposed -- including DeepSHAP which modifies the DeepLift algorithm to estimate the relative importance (shapley values) of input features for a given decision by comparing the activations in the network for a given image against the activations caused by a reference input. In image classification, the reference input tends to be a plain black image. While DeepSHAP has been well studied for image classification tasks, it remains to be seen how we can adapt it to explain the output of Neural Retrieval Models (NRMs). In particular, what is a good "black" image in the context of IR? In this paper we explored various reference input document construction techniques. Additionally, we compared the explanations generated by DeepSHAP to LIME (a model agnostic approach) and found that the explanations differ considerably. Our study raises concerns regarding the robustness and accuracy of explanations produced for NRMs. With this paper we aim to shed light on interesting problems surrounding interpretability in NRMs and highlight areas of future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies DeepSHAP to neural rankers by testing reference documents and reports divergence from LIME, but treats that divergence as evidence of poor robustness without an independent check.

read the letter

The main thing to know is that this work takes DeepSHAP, which relies on a reference baseline, and tries to make it work for neural retrieval models by building different reference documents, then compares the resulting feature attributions to those from LIME and finds they do not line up. That observation is the core new piece. What the paper does reasonably is spell out why the usual image trick of a blank reference does not carry over to text and then actually tests a few construction methods for those references. The comparison to LIME is a straightforward way to probe whether the attributions hold up across explanation techniques. That part is useful for anyone who has to pick an off-the-shelf explainer for a ranking model. The soft spot sits in the interpretation. DeepSHAP and LIME rest on different assumptions about how to measure feature importance, so systematic differences are what you would expect rather than automatic proof that the explanations are fragile or inaccurate. The abstract gives no numbers on the size of the mismatch, no dataset details, and no external test such as known term importance or downstream prediction accuracy to decide which method is closer to the model's actual behavior. Without that, the claim that the differences raise concerns about robustness stays more suggestive than demonstrated. This is the sort of paper that would interest people working on explainable neural IR who need to know the practical limits of current post-hoc methods. A reader who wants to see the reference-input problem laid out concretely will get value from it. It is exploratory and the central inference needs more support, but the question it surfaces is real enough that a serious referee should look at it.

Referee Report

1 major / 2 minor

Summary. The manuscript explores adapting DeepSHAP to explain Neural Retrieval Models (NRMs) by testing multiple reference-document constructions as baselines (contrasted with the black-image baseline used in vision). It compares the resulting feature attributions against those produced by LIME on the same NRMs and reports that the two sets of explanations differ considerably, from which it concludes that explanations for NRMs raise concerns about robustness and accuracy.

Significance. If the observed differences could be shown to correspond to actual inaccuracies (via ground-truth term importance or downstream task performance), the work would usefully flag a methodological gap in applying off-the-shelf explanation techniques to neural IR. The paper correctly notes that the choice of reference input is non-obvious in the text domain. As presented, however, the study remains purely exploratory, supplies no quantitative metrics, error analysis, or dataset details, and therefore offers limited immediate guidance to the field.

major comments (1)

[Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.

minor comments (2)

[Abstract] The abstract states that 'various reference input document construction techniques' were explored but gives no enumeration of the techniques, no quantitative comparison among them, and no description of the underlying NRM, collection, or evaluation protocol.
No tables, figures, or numerical results (e.g., attribution overlap scores, rank correlations between DeepSHAP and LIME) are referenced, making it impossible to assess the magnitude or consistency of the reported differences.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the manuscript is exploratory and that stronger claims require additional evidence. We respond to the major comment on the abstract below and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.

Authors: We thank the referee for this observation. We agree that, absent an independent adjudication criterion, it is not possible to determine which method's attributions are more accurate. We will therefore revise the abstract to remove references to concerns about accuracy and instead emphasize that the substantial observed differences between DeepSHAP and LIME raise questions about the robustness of explanations for NRMs. The study is intended to be exploratory and to highlight methodological challenges when adapting these techniques to text retrieval; we will make this framing clearer in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison without derivations or self-referential claims

full rationale

The paper conducts an empirical study comparing DeepSHAP (with varied reference documents) and LIME explanations on neural retrieval models. It reports observed differences and raises concerns about robustness, but contains no mathematical derivations, fitted parameters renamed as predictions, uniqueness theorems, or self-citations that bear the central claim. All methods (DeepSHAP, LIME) are external; reference construction is exploratory rather than a closed loop. The inference from disagreement to accuracy concerns is interpretive and open to the skeptic's critique, but does not constitute circularity by construction. This is a standard non-circular empirical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or new theoretical constructs; the paper is an empirical exploration of existing explanation tools.

pith-pipeline@v0.9.0 · 5778 in / 977 out tokens · 15757 ms · 2026-05-24T21:25:16.428523+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 3 internal anchors

[1]

What is relevant in a text document?

Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wo- jciech Samek. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PLOS ONE 12 (2017), 1–23

work page 2017
[2]

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10 (2015), 1–46

work page 2015
[3]

Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng

work page
[4]

MatchZoo: A Toolkit for Deep Text Matching. (2017). arXiv:1707.07270

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Amirata Ghorbani, Abubakar Abid, and James Y. Zou. 2019. Interpretation of Neural Networks is Fragile. In AAAI ’19

work page 2019
[6]

Bruce Croft

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM ’16. ACM, 55–64

work page 2016
[7]

McCormick, and David Madigan

Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan

work page
[8]

The Annals of Applied Statistics 9, 3 (2015), 1350–1371

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350–1371

work page 2015
[9]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . 4765–4774

work page 2017
[10]

Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP ’18

work page 2018
[11]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint (2017). arXiv:1707.07700

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng

work page
[13]

In AAAI’16

Text Matching As Image Recognition. In AAAI’16. 2793–2799

work page
[14]

Daan Rennings, Felipe Moraes, and Claudia Hauff. 2019. An Axiomatic Approach to Diagnosing Neural IR Models. In ECIR ’19. 489–503

work page 2019
[15]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD ’16. ACM, 1135–1144

work page 2016
[16]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI ’18

work page 2018
[17]

Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307–317

work page 1953
[18]

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Im- portant Features Through Propagating Activation Differences. arXiv preprint (2017). arXiv:1704.02685

work page arXiv 2017
[19]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ICLR Workshop (2014)

work page 2014
[20]

Jaspreet Singh and Avishek Anand. 2018. Interpreting search result rankings through intent modeling. arXiv preprint (2018). arXiv:1809.05190

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19) . ACM, 770–773

work page 2019
[22]

Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349–391

work page 2016
[23]

Zemel, and Yoshua Bengio

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning - Volume 37 (ICML’15) . 2048–2057

work page 2015

[1] [1]

What is relevant in a text document?

Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wo- jciech Samek. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PLOS ONE 12 (2017), 1–23

work page 2017

[2] [2]

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10 (2015), 1–46

work page 2015

[3] [3]

Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng

work page

[4] [4]

MatchZoo: A Toolkit for Deep Text Matching. (2017). arXiv:1707.07270

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Amirata Ghorbani, Abubakar Abid, and James Y. Zou. 2019. Interpretation of Neural Networks is Fragile. In AAAI ’19

work page 2019

[6] [6]

Bruce Croft

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM ’16. ACM, 55–64

work page 2016

[7] [7]

McCormick, and David Madigan

Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan

work page

[8] [8]

The Annals of Applied Statistics 9, 3 (2015), 1350–1371

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350–1371

work page 2015

[9] [9]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . 4765–4774

work page 2017

[10] [10]

Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP ’18

work page 2018

[11] [11]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint (2017). arXiv:1707.07700

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng

work page

[13] [13]

In AAAI’16

Text Matching As Image Recognition. In AAAI’16. 2793–2799

work page

[14] [14]

Daan Rennings, Felipe Moraes, and Claudia Hauff. 2019. An Axiomatic Approach to Diagnosing Neural IR Models. In ECIR ’19. 489–503

work page 2019

[15] [15]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD ’16. ACM, 1135–1144

work page 2016

[16] [16]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI ’18

work page 2018

[17] [17]

Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307–317

work page 1953

[18] [18]

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Im- portant Features Through Propagating Activation Differences. arXiv preprint (2017). arXiv:1704.02685

work page arXiv 2017

[19] [19]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ICLR Workshop (2014)

work page 2014

[20] [20]

Jaspreet Singh and Avishek Anand. 2018. Interpreting search result rankings through intent modeling. arXiv preprint (2018). arXiv:1809.05190

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19) . ACM, 770–773

work page 2019

[22] [22]

Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349–391

work page 2016

[23] [23]

Zemel, and Yoshua Bengio

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning - Volume 37 (ICML’15) . 2048–2057

work page 2015