A study on the Interpretability of Neural Retrieval Models using DeepSHAP
Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3
The pith
Explanations from DeepSHAP for neural retrieval models differ considerably from LIME outputs, raising concerns about their robustness and accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By exploring various reference input document construction techniques for DeepSHAP on neural retrieval models and comparing the generated explanations to LIME, the explanations differ considerably. This raises concerns regarding the robustness and accuracy of explanations produced for NRMs.
What carries the argument
DeepSHAP adapted via reference input document construction techniques to estimate relative importance of input features for neural retrieval model decisions.
If this is right
- Direct application of image-classification explanation methods to text retrieval requires careful choice of reference inputs.
- Interpretability techniques for neural retrieval models may need domain-specific adaptations instead of off-the-shelf transfer.
- Current explanation methods leave open questions about which features truly drive relevance judgments in neural models.
- Future work should target more reliable ways to explain document relevance decisions in neural retrieval.
Where Pith is reading between the lines
- Practitioners using these explanations to debug or audit search systems should treat the outputs as provisional until consistency improves.
- Similar discrepancies could appear when applying explanation methods to other ranking or recommendation models trained on text.
- One way to test the concern would be to measure how often each method's highlighted terms align with explicit human relevance judgments on the same queries.
Load-bearing premise
Substantial differences between DeepSHAP and LIME outputs on the same neural retrieval model indicate a lack of robustness or accuracy rather than simply reflecting the distinct assumptions each method makes about feature importance.
What would settle it
A controlled test on synthetic retrieval data where ground-truth important terms are known in advance, showing whether DeepSHAP and LIME both recover those terms accurately despite producing different explanations on real data.
Figures
read the original abstract
A recent trend in IR has been the usage of neural networks to learn retrieval models for text based adhoc search. While various approaches and architectures have yielded significantly better performance than traditional retrieval models such as BM25, it is still difficult to understand exactly why a document is relevant to a query. In the ML community several approaches for explaining decisions made by deep neural networks have been proposed -- including DeepSHAP which modifies the DeepLift algorithm to estimate the relative importance (shapley values) of input features for a given decision by comparing the activations in the network for a given image against the activations caused by a reference input. In image classification, the reference input tends to be a plain black image. While DeepSHAP has been well studied for image classification tasks, it remains to be seen how we can adapt it to explain the output of Neural Retrieval Models (NRMs). In particular, what is a good "black" image in the context of IR? In this paper we explored various reference input document construction techniques. Additionally, we compared the explanations generated by DeepSHAP to LIME (a model agnostic approach) and found that the explanations differ considerably. Our study raises concerns regarding the robustness and accuracy of explanations produced for NRMs. With this paper we aim to shed light on interesting problems surrounding interpretability in NRMs and highlight areas of future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript explores adapting DeepSHAP to explain Neural Retrieval Models (NRMs) by testing multiple reference-document constructions as baselines (contrasted with the black-image baseline used in vision). It compares the resulting feature attributions against those produced by LIME on the same NRMs and reports that the two sets of explanations differ considerably, from which it concludes that explanations for NRMs raise concerns about robustness and accuracy.
Significance. If the observed differences could be shown to correspond to actual inaccuracies (via ground-truth term importance or downstream task performance), the work would usefully flag a methodological gap in applying off-the-shelf explanation techniques to neural IR. The paper correctly notes that the choice of reference input is non-obvious in the text domain. As presented, however, the study remains purely exploratory, supplies no quantitative metrics, error analysis, or dataset details, and therefore offers limited immediate guidance to the field.
major comments (1)
- [Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.
minor comments (2)
- [Abstract] The abstract states that 'various reference input document construction techniques' were explored but gives no enumeration of the techniques, no quantitative comparison among them, and no description of the underlying NRM, collection, or evaluation protocol.
- No tables, figures, or numerical results (e.g., attribution overlap scores, rank correlations between DeepSHAP and LIME) are referenced, making it impossible to assess the magnitude or consistency of the reported differences.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the manuscript is exploratory and that stronger claims require additional evidence. We respond to the major comment on the abstract below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the inference that 'the explanations differ considerably' and therefore 'raise concerns regarding the robustness and accuracy of explanations produced for NRMs' is not supported. DeepSHAP approximates Shapley values relative to a chosen baseline while LIME fits local linear models; systematic divergence is the expected outcome under their distinct assumptions. No independent adjudication criterion (known relevant terms, human judgments, or retrieval-performance correlation) is supplied to decide which attributions are correct.
Authors: We thank the referee for this observation. We agree that, absent an independent adjudication criterion, it is not possible to determine which method's attributions are more accurate. We will therefore revise the abstract to remove references to concerns about accuracy and instead emphasize that the substantial observed differences between DeepSHAP and LIME raise questions about the robustness of explanations for NRMs. The study is intended to be exploratory and to highlight methodological challenges when adapting these techniques to text retrieval; we will make this framing clearer in the revision. revision: yes
Circularity Check
No circularity: empirical comparison without derivations or self-referential claims
full rationale
The paper conducts an empirical study comparing DeepSHAP (with varied reference documents) and LIME explanations on neural retrieval models. It reports observed differences and raises concerns about robustness, but contains no mathematical derivations, fitted parameters renamed as predictions, uniqueness theorems, or self-citations that bear the central claim. All methods (DeepSHAP, LIME) are external; reference construction is exploratory rather than a closed loop. The inference from disagreement to accuracy concerns is interpretive and open to the skeptic's critique, but does not constitute circularity by construction. This is a standard non-circular empirical analysis.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
What is relevant in a text document?
Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wo- jciech Samek. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PLOS ONE 12 (2017), 1–23
work page 2017
-
[2]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10 (2015), 1–46
work page 2015
-
[3]
Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng
-
[4]
MatchZoo: A Toolkit for Deep Text Matching. (2017). arXiv:1707.07270
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Amirata Ghorbani, Abubakar Abid, and James Y. Zou. 2019. Interpretation of Neural Networks is Fragile. In AAAI ’19
work page 2019
-
[6]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM ’16. ACM, 55–64
work page 2016
-
[7]
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan
-
[8]
The Annals of Applied Statistics 9, 3 (2015), 1350–1371
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350–1371
work page 2015
-
[9]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . 4765–4774
work page 2017
-
[10]
Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP ’18
work page 2018
-
[11]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint (2017). arXiv:1707.07700
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng
- [13]
-
[14]
Daan Rennings, Felipe Moraes, and Claudia Hauff. 2019. An Axiomatic Approach to Diagnosing Neural IR Models. In ECIR ’19. 489–503
work page 2019
-
[15]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD ’16. ACM, 1135–1144
work page 2016
-
[16]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI ’18
work page 2018
-
[17]
Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307–317
work page 1953
- [18]
-
[19]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ICLR Workshop (2014)
work page 2014
-
[20]
Jaspreet Singh and Avishek Anand. 2018. Interpreting search result rankings through intent modeling. arXiv preprint (2018). arXiv:1809.05190
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19) . ACM, 770–773
work page 2019
-
[22]
Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349–391
work page 2016
-
[23]
Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning - Volume 37 (ICML’15) . 2048–2057
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.