pith. sign in

arxiv: 1907.06484 · v1 · pith:KKIHZSCCnew · submitted 2019-07-15 · 💻 cs.IR · cs.LG

A study on the Interpretability of Neural Retrieval Models using DeepSHAP

Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords interpretabilityneural retrieval modelsDeepSHAPLIMEexplanationsinformation retrievalShapley values
0
0 comments X

The pith

Explanations from DeepSHAP for neural retrieval models differ considerably from LIME outputs, raising concerns about their robustness and accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to adapt DeepSHAP, an explanation technique that estimates feature importance by comparing network activations to a reference input, for use with neural retrieval models. It tests different ways to build reference documents as the baseline input and then compares the resulting explanations against those produced by LIME on the same models. The comparisons show large differences between the two methods. A sympathetic reader would care because reliable explanations are needed to understand why a neural model judges a document relevant to a query, yet these differences suggest current techniques may not deliver consistent or trustworthy insights.

Core claim

By exploring various reference input document construction techniques for DeepSHAP on neural retrieval models and comparing the generated explanations to LIME, the explanations differ considerably. This raises concerns regarding the robustness and accuracy of explanations produced for NRMs.

What carries the argument

DeepSHAP adapted via reference input document construction techniques to estimate relative importance of input features for neural retrieval model decisions.

If this is right

  • Direct application of image-classification explanation methods to text retrieval requires careful choice of reference inputs.
  • Interpretability techniques for neural retrieval models may need domain-specific adaptations instead of off-the-shelf transfer.
  • Current explanation methods leave open questions about which features truly drive relevance judgments in neural models.
  • Future work should target more reliable ways to explain document relevance decisions in neural retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners using these explanations to debug or audit search systems should treat the outputs as provisional until consistency improves.
  • Similar discrepancies could appear when applying explanation methods to other ranking or recommendation models trained on text.
  • One way to test the concern would be to measure how often each method's highlighted terms align with explicit human relevance judgments on the same queries.

Load-bearing premise

Substantial differences between DeepSHAP and LIME outputs on the same neural retrieval model indicate a lack of robustness or accuracy rather than simply reflecting the distinct assumptions each method makes about feature importance.

What would settle it

A controlled test on synthetic retrieval data where ground-truth important terms are known in advance, showing whether DeepSHAP and LIME both recover those terms accurately despite producing different explanations on real data.

Figures

Figures reproduced from arXiv: 1907.06484 by Avishek Anand, Jaspreet Singh, Zeon Trevor Fernando.

Figure 1
Figure 1. Figure 1: Confusion matrices of various DeepSHAP background document methods comparing Jaccard similarities. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

A recent trend in IR has been the usage of neural networks to learn retrieval models for text based adhoc search. While various approaches and architectures have yielded significantly better performance than traditional retrieval models such as BM25, it is still difficult to understand exactly why a document is relevant to a query. In the ML community several approaches for explaining decisions made by deep neural networks have been proposed -- including DeepSHAP which modifies the DeepLift algorithm to estimate the relative importance (shapley values) of input features for a given decision by comparing the activations in the network for a given image against the activations caused by a reference input. In image classification, the reference input tends to be a plain black image. While DeepSHAP has been well studied for image classification tasks, it remains to be seen how we can adapt it to explain the output of Neural Retrieval Models (NRMs). In particular, what is a good "black" image in the context of IR? In this paper we explored various reference input document construction techniques. Additionally, we compared the explanations generated by DeepSHAP to LIME (a model agnostic approach) and found that the explanations differ considerably. Our study raises concerns regarding the robustness and accuracy of explanations produced for NRMs. With this paper we aim to shed light on interesting problems surrounding interpretability in NRMs and highlight areas of future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript explores adapting DeepSHAP to explain Neural Retrieval Models (NRMs) by testing multiple reference-document constructions as baselines (contrasted with the black-image baseline used in vision). It compares the resulting feature attributions against those produced by LIME on the same NRMs and reports that the two sets of explanations differ considerably, from which it concludes that explanations for NRMs raise concerns about robustness and accuracy.

Significance. If the observed differences could be shown to correspond to actual inaccuracies (via ground-truth term importance or downstream task performance), the work would usefully flag a methodological gap in applying off-the-shelf explanation techniques to neural IR. The paper correctly notes that the choice of reference input is non-obvious in the text domain. As presented, however, the study remains purely exploratory, supplies no quantitative metrics, error analysis, or dataset details, and therefore offers limited immediate guidance to the field.

major comments (1)
  1. [Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.
minor comments (2)
  1. [Abstract] The abstract states that 'various reference input document construction techniques' were explored but gives no enumeration of the techniques, no quantitative comparison among them, and no description of the underlying NRM, collection, or evaluation protocol.
  2. No tables, figures, or numerical results (e.g., attribution overlap scores, rank correlations between DeepSHAP and LIME) are referenced, making it impossible to assess the magnitude or consistency of the reported differences.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the manuscript is exploratory and that stronger claims require additional evidence. We respond to the major comment on the abstract below and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.

    Authors: We thank the referee for this observation. We agree that, absent an independent adjudication criterion, it is not possible to determine which method's attributions are more accurate. We will therefore revise the abstract to remove references to concerns about accuracy and instead emphasize that the substantial observed differences between DeepSHAP and LIME raise questions about the robustness of explanations for NRMs. The study is intended to be exploratory and to highlight methodological challenges when adapting these techniques to text retrieval; we will make this framing clearer in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison without derivations or self-referential claims

full rationale

The paper conducts an empirical study comparing DeepSHAP (with varied reference documents) and LIME explanations on neural retrieval models. It reports observed differences and raises concerns about robustness, but contains no mathematical derivations, fitted parameters renamed as predictions, uniqueness theorems, or self-citations that bear the central claim. All methods (DeepSHAP, LIME) are external; reference construction is exploratory rather than a closed loop. The inference from disagreement to accuracy concerns is interpretive and open to the skeptic's critique, but does not constitute circularity by construction. This is a standard non-circular empirical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or new theoretical constructs; the paper is an empirical exploration of existing explanation tools.

pith-pipeline@v0.9.0 · 5778 in / 977 out tokens · 15757 ms · 2026-05-24T21:25:16.428523+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 3 internal anchors

  1. [1]

    What is relevant in a text document?

    Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wo- jciech Samek. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PLOS ONE 12 (2017), 1–23

  2. [2]

    Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10 (2015), 1–46

  3. [3]

    Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng

  4. [4]

    MatchZoo: A Toolkit for Deep Text Matching. (2017). arXiv:1707.07270

  5. [5]

    Amirata Ghorbani, Abubakar Abid, and James Y. Zou. 2019. Interpretation of Neural Networks is Fragile. In AAAI ’19

  6. [6]

    Bruce Croft

    Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM ’16. ACM, 55–64

  7. [7]

    McCormick, and David Madigan

    Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan

  8. [8]

    The Annals of Applied Statistics 9, 3 (2015), 1350–1371

    Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350–1371

  9. [9]

    Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . 4765–4774

  10. [10]

    Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP ’18

  11. [11]

    Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint (2017). arXiv:1707.07700

  12. [12]

    Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng

  13. [13]

    In AAAI’16

    Text Matching As Image Recognition. In AAAI’16. 2793–2799

  14. [14]

    Daan Rennings, Felipe Moraes, and Claudia Hauff. 2019. An Axiomatic Approach to Diagnosing Neural IR Models. In ECIR ’19. 489–503

  15. [15]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD ’16. ACM, 1135–1144

  16. [16]

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI ’18

  17. [17]

    Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307–317

  18. [18]

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Im- portant Features Through Propagating Activation Differences. arXiv preprint (2017). arXiv:1704.02685

  19. [19]

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ICLR Workshop (2014)

  20. [20]

    Jaspreet Singh and Avishek Anand. 2018. Interpreting search result rankings through intent modeling. arXiv preprint (2018). arXiv:1809.05190

  21. [21]

    Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19) . ACM, 770–773

  22. [22]

    Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349–391

  23. [23]

    Zemel, and Yoshua Bengio

    Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning - Volume 37 (ICML’15) . 2048–2057