pith. sign in

arxiv: 2409.00860 · v4 · submitted 2024-09-01 · 💻 cs.IR

A Counterfactual Explanation Framework for Retrieval Models

Pith reviewed 2026-05-23 20:46 UTC · model grok-4.3

classification 💻 cs.IR
keywords counterfactual explanationsinformation retrievalranking modelsexplainabilityretrieval modelsneural rankingBM25
0
0 comments X

The pith

A counterfactual method identifies terms to add to a document that would improve its rank for a query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that uses counterfactual reasoning to explain why a retrieval model ranks a given document lower than others for a specific query. Rather than focusing on why documents are relevant, the approach determines which absent terms, if inserted, would raise the document's position in the results. This directly points to the words already present that the model treated as unfavorable. The method applies to both classical models such as BM25 and neural models including DRMM, DSSM, ColBERT, and MonoT5, and the authors present it as the first attempt to solve this exact form of counterfactual question in retrieval.

Core claim

We introduce a counterfactual explanation framework for retrieval models that determines the terms that need to be added to a document to improve its ranking with respect to a given query. This identifies the absence of which words affects the ranking, providing an explanation for why the document was not favored by the model.

What carries the argument

The counterfactual framework that generates hypothetical term additions to improve ranking scores.

Load-bearing premise

Identifying terms whose addition would improve ranking gives a valid explanation of why the original document was disfavored, without direct model access or external checks on the counterfactuals.

What would settle it

Apply the generated term additions to the original document and measure whether the retrieval model actually produces the predicted higher rank.

Figures

Figures reproduced from arXiv: 2409.00860 by Bhavik Chandna, Procheta Sen.

Figure 1
Figure 1. Figure 1: Counterfactual Explanation Model Description [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Counterfactual Classifier Performance Variance with Top-K and Counterfactual Performance Variance with variation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average Rank shift by CFIR for BM25, DRMM, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average Semantic Similarity between original doc [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Explainability has become a crucial concern in today's world, aiming to enhance transparency in machine learning and deep learning models. Information retrieval is no exception to this trend. In existing literature on explainability of information retrieval, the emphasis has predominantly been on illustrating the concept of relevance concerning a retrieval model. The questions addressed include why a document is relevant to a query, why one document exhibits higher relevance than another, or why a specific set of documents is deemed relevant for a query. However, limited attention has been given to understanding why a particular document is not favored (e.g., not within top-K) with respect to a query and a retrieval model. In an effort to address this gap, our work focuses on the question of what terms need to be added within a document to improve its ranking. This, in turn, answers the question of which words in the document played a role in not being favored by a retrieval model for a particular query. We use a counterfactual framework to solve the above-mentioned research problem. % To the best of our knowledge, we mark the first attempt to tackle this specific counterfactual problem (i.e. examining the absence of which words can affect the ranking of a document). Our experiments show the effectiveness of our proposed approach in predicting counterfactuals for both statistical (e.g. BM25) and deep-learning-based models (e.g. DRMM, DSSM, ColBERT, MonoT5).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a counterfactual explanation framework for retrieval models that identifies terms whose addition to a document would improve its ranking for a query, thereby explaining why the document was originally disfavored. It claims to be the first work addressing this specific problem and reports experiments demonstrating effectiveness on both statistical models (e.g., BM25) and neural models (DRMM, DSSM, ColBERT, MonoT5).

Significance. If the counterfactuals are faithful, the work would address a genuine gap in IR explainability by focusing on disfavor rather than relevance. The breadth of models tested is a strength. The significance is limited by the absence of evidence that addition-based changes provide valid inverse explanations of the original decision, especially for non-linear models.

major comments (2)
  1. [Abstract] Abstract: the central claim equates finding terms whose addition raises rank with identifying words responsible for original disfavor. This requires the counterfactual mapping to be faithful (precise inverse of disfavoring factors), but no perturbation symmetry tests, model-internal access, or external validation (human/automated) are described to support this for non-linear models where interactions are not additive.
  2. [Experiments] Experiments section: effectiveness is asserted for DRMM, ColBERT, and MonoT5, but without reported metrics on faithfulness, comparison to baselines for explanation quality, or controls for whether the added terms simply describe a different high-scoring document rather than explaining the original ranking, the results do not establish the framework's validity.
minor comments (1)
  1. The abstract would be clearer with a one-sentence outline of the algorithmic procedure used to generate the counterfactual terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the scope of our claims and indicating where revisions will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim equates finding terms whose addition raises rank with identifying words responsible for original disfavor. This requires the counterfactual mapping to be faithful (precise inverse of disfavoring factors), but no perturbation symmetry tests, model-internal access, or external validation (human/automated) are described to support this for non-linear models where interactions are not additive.

    Authors: We agree that the mapping from addition-based counterfactuals to explanations of original disfavor is not guaranteed to be a precise inverse for non-linear models. Our framework defines the explanation explicitly as the minimal terms whose absence caused the low rank (i.e., the terms that, when added, produce a measurable rank improvement). This is a directional counterfactual rather than a symmetric perturbation. We did not claim model-internal access or perform symmetry tests because the approach is model-agnostic and applies to black-box rankers. We will revise the abstract and introduction to state the claim more precisely as “terms whose addition improves rank, thereby highlighting factors contributing to the original disfavor” and add a limitations paragraph discussing the non-invertibility issue for non-linear models. revision: partial

  2. Referee: [Experiments] Experiments section: effectiveness is asserted for DRMM, ColBERT, and MonoT5, but without reported metrics on faithfulness, comparison to baselines for explanation quality, or controls for whether the added terms simply describe a different high-scoring document rather than explaining the original ranking, the results do not establish the framework's validity.

    Authors: The primary effectiveness metric reported is the achieved rank improvement after term addition, which directly measures whether the identified terms address the original low ranking. We will add explicit faithfulness metrics (e.g., rank delta before/after addition, comparison against random term addition and against terms drawn from top-ranked documents) in the revised experiments section. We will also include a control experiment that verifies the added terms are query-specific rather than generic high-scoring document descriptors. These additions will be reported for all models, including the neural ones. revision: yes

Circularity Check

0 steps flagged

No circularity: novel counterfactual framework with no self-referential derivations or fitted predictions

full rationale

The paper proposes a new counterfactual method to identify terms whose addition would improve a document's rank, framing this as an explanation for original disfavor. No equations, fitted parameters, or derivation chains are present in the abstract or described approach that reduce outputs to inputs by construction. The claim of being the 'first attempt' is a novelty assertion, not a load-bearing self-citation or self-definition. The central mapping from addition-based counterfactuals to explanations is an unvalidated modeling assumption (correctness issue) rather than a circular reduction. The work remains self-contained as a methodological proposal without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or background assumptions; ledger is empty.

pith-pipeline@v0.9.0 · 5780 in / 1013 out tokens · 25491 ms · 2026-05-23T20:46:03.265958+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey

  2. [2]

    Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In InCoCo@NIPS

  3. [3]

    Alexander Bondarenko, Maik Fröbe, Jan Heinrich Reimer, Benno Stein, Michael Völske, and Matthias Hagen. 2022. Axiomatic Retrieval Experimentation with ir_axioms. In Proc. of SIGIR 2022 . 3131–3140

  4. [4]

    Miguel Á Carreira-Perpiñán and Suryabhan Singh Hada. 2021. Counterfactual explanations for oblique decision trees: Exact, efficient algorithms. InProceedings of the AAAI conference on artificial intelligence , Vol. 35. 6903–6911

  5. [5]

    Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). arXiv:2102.07662 https://arxiv.org/abs/2102.07662

  6. [6]

    Voorhees, and Ian Soboroff

    Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff. 2023. Overview of the TREC 2022 deep learning track. In Text REtrieval Conference (TREC). NIST, TREC

  7. [7]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT

  8. [8]

    Gokhan Egri and Coskun Bayrak. 2014. The Role of Search Engine Optimization on Keeping the User on the Site. Procedia Computer Science 36 (2014), 335–

  9. [9]

    https://doi.org/10.1016/j.procs.2014.09.102 Complex Adaptive Systems Philadelphia, PA November 3-5, 2014

  10. [10]

    Anett Erdmann, Ramón Arilla, and José M. Ponzoa. 2022. Search engine opti- mization: The long-term strategy of keyword choice. Journal of Business Research 144 (2022), 650–662. https://doi.org/10.1016/j.jbusres.2022.01.065

  11. [11]

    Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT. https://doi.org/10.5281/zenodo.4461265

  12. [12]

    Bruce Croft

    Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. InProceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 55–64

  13. [13]

    Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 1297–1300

  14. [14]

    Faisal Hamman, Erfaun Noorani, Saumitra Mishra, Daniele Magazzeni, and Sang- hamitra Dutta. 2023. Robust counterfactual explanations for neural networks with probabilistic guarantees. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 499, 17 pages

  15. [15]

    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). 2333–2338

  16. [16]

    Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, and Hiroki Arimura. 2021. Ordered counterfactual explanation by mixed-integer linear optimization. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11564–11574

  17. [17]

    Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. 2020. Model- agnostic counterfactual explanations for consequential decisions. InInternational Conference on Artificial Intelligence and Statistics . PMLR, 895–905

  18. [18]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48

  19. [19]

    Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Infor- mation Retrieval Research with Sparse and Dense Representations. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2356–2362

  20. [20]

    Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, and Xueqi Cheng. 2023. Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birm- ingham, United Kingdom) (CIKM ’23). Association f...

  21. [21]

    InPerson

    Lijun Lyu and Avishek Anand. 2023. Listwise Explanations for Ranking Models Using Multiple Explainers. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part I (<conf-loc content-type="InPerson">Dublin, Ireland</conf- loc>). Springer-Verlag, Berlin, Heidelberg...

  22. [22]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS

  23. [23]

    Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 607–617

  24. [24]

    Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Docu- ment Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020 , Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 708–718

  25. [25]

    Axel Parmentier and Thibaut Vidal. 2021. Optimal counterfactual explanations in tree ensembles. In International conference on machine learning . PMLR, 8422– 8431

  26. [26]

    Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, and Himabindu Lakkaraju. 2022. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In Interna- tional Conference on Artificial Intelligence and Statistics . PMLR, 4574–4594. Conference’17, July 2017, Washington, DC, USA Bhav...

  27. [27]

    Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018)

  28. [28]

    Gustavo Penha, Eyal Krikon, and Vanessa Murdock. 2022. Pairwise review- based explanations for voice product search. In ACM SIGIR Conference on Human Information Interaction and Retrieval . 300–304

  29. [29]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProc.of SIGKDD 2016. 1135–1144

  30. [30]

    Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). 770–773

  31. [31]

    Jaspreet Singh and Avishek Anand. 2020. Model agnostic interpretability of rankers via intent modelling. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 618–628

  32. [32]

    Arnaud Van Looveren and Janis Klaise. 2021. Interpretable counterfactual expla- nations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Springer, 650–665

  33. [33]

    Ellen Voorhees. 2005. Overview of the TREC 2004 Robust Retrieval Track. https: //doi.org/10.6028/NIST.SP.500-261

  34. [34]

    Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2022. PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models. ArXiv preprint abs/2204.01321 (2022). https://arxiv.org/abs/ 2204.01321

  35. [35]

    Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng. 2022. Are Neural Ranking Models Robust? ACM Trans. Inf. Syst. 41, 2, Article 29 (dec 2022), 36 pages

  36. [36]

    Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, and Alex Jaimes. 2024. Counterfactual Editing for Search Result Explanation. arXiv:2301.10389 [cs.IR]

  37. [37]

    Puxuan Yu, Razieh Rahimi, and James Allan. 2022. Towards explainable search results: a listwise explanation generator. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 669–680. 7 APPENDIX 7.1 Retrieval Performance of IR Models We use Lin et al. [18] toolkit for implementing BM25 and MonoT...