A Counterfactual Explanation Framework for Retrieval Models

Bhavik Chandna; Procheta Sen

arxiv: 2409.00860 · v4 · submitted 2024-09-01 · 💻 cs.IR

A Counterfactual Explanation Framework for Retrieval Models

Bhavik Chandna , Procheta Sen This is my paper

Pith reviewed 2026-05-23 20:46 UTC · model grok-4.3

classification 💻 cs.IR

keywords counterfactual explanationsinformation retrievalranking modelsexplainabilityretrieval modelsneural rankingBM25

0 comments

The pith

A counterfactual method identifies terms to add to a document that would improve its rank for a query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that uses counterfactual reasoning to explain why a retrieval model ranks a given document lower than others for a specific query. Rather than focusing on why documents are relevant, the approach determines which absent terms, if inserted, would raise the document's position in the results. This directly points to the words already present that the model treated as unfavorable. The method applies to both classical models such as BM25 and neural models including DRMM, DSSM, ColBERT, and MonoT5, and the authors present it as the first attempt to solve this exact form of counterfactual question in retrieval.

Core claim

We introduce a counterfactual explanation framework for retrieval models that determines the terms that need to be added to a document to improve its ranking with respect to a given query. This identifies the absence of which words affects the ranking, providing an explanation for why the document was not favored by the model.

What carries the argument

The counterfactual framework that generates hypothetical term additions to improve ranking scores.

Load-bearing premise

Identifying terms whose addition would improve ranking gives a valid explanation of why the original document was disfavored, without direct model access or external checks on the counterfactuals.

What would settle it

Apply the generated term additions to the original document and measure whether the retrieval model actually produces the predicted higher rank.

Figures

Figures reproduced from arXiv: 2409.00860 by Bhavik Chandna, Procheta Sen.

**Figure 2.** Figure 2: Counterfactual Classifier Performance Variance with Top-K and Counterfactual Performance Variance with variation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Average Rank shift by CFIR for BM25, DRMM, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Average Semantic Similarity between original doc [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Explainability has become a crucial concern in today's world, aiming to enhance transparency in machine learning and deep learning models. Information retrieval is no exception to this trend. In existing literature on explainability of information retrieval, the emphasis has predominantly been on illustrating the concept of relevance concerning a retrieval model. The questions addressed include why a document is relevant to a query, why one document exhibits higher relevance than another, or why a specific set of documents is deemed relevant for a query. However, limited attention has been given to understanding why a particular document is not favored (e.g., not within top-K) with respect to a query and a retrieval model. In an effort to address this gap, our work focuses on the question of what terms need to be added within a document to improve its ranking. This, in turn, answers the question of which words in the document played a role in not being favored by a retrieval model for a particular query. We use a counterfactual framework to solve the above-mentioned research problem. % To the best of our knowledge, we mark the first attempt to tackle this specific counterfactual problem (i.e. examining the absence of which words can affect the ranking of a document). Our experiments show the effectiveness of our proposed approach in predicting counterfactuals for both statistical (e.g. BM25) and deep-learning-based models (e.g. DRMM, DSSM, ColBERT, MonoT5).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a first attempt at counterfactual explanations for low-ranked documents in IR by identifying terms to add, but the abstract leaves the method and validation too thin to judge if it actually explains model behavior.

read the letter

The core idea is to treat low ranking as a problem of missing terms and use counterfactuals to find which words, if added, would lift a document's position. This flips the usual explainability question from why something ranks high to why it does not, and the authors position it as the first such effort for retrieval models. That direction fills a real gap, since most prior work stays on positive relevance signals. They test the approach on both BM25 and several neural rankers, which is a reasonable spread for an initial study. Credit for trying to make the framework model-agnostic without internals access. The main weakness is that the abstract gives no algorithm, no loss function, no perturbation details, and no quantitative results or human validation. Without those, it is impossible to check whether the generated additions are faithful explanations or simply describe any higher-scoring document. For non-linear models the additivity assumption looks especially fragile, and the stress-test concern about missing symmetry or external checks lands because nothing in the provided text addresses it. The work is aimed at IR explainability researchers who already care about counterfactuals; a reader outside that niche will not get much. It is coherent on its own terms and shows honest engagement with the literature, so it clears the bar for serious refereeing even if the experiments turn out to need heavy revision.

Referee Report

2 major / 1 minor

Summary. The paper proposes a counterfactual explanation framework for retrieval models that identifies terms whose addition to a document would improve its ranking for a query, thereby explaining why the document was originally disfavored. It claims to be the first work addressing this specific problem and reports experiments demonstrating effectiveness on both statistical models (e.g., BM25) and neural models (DRMM, DSSM, ColBERT, MonoT5).

Significance. If the counterfactuals are faithful, the work would address a genuine gap in IR explainability by focusing on disfavor rather than relevance. The breadth of models tested is a strength. The significance is limited by the absence of evidence that addition-based changes provide valid inverse explanations of the original decision, especially for non-linear models.

major comments (2)

[Abstract] Abstract: the central claim equates finding terms whose addition raises rank with identifying words responsible for original disfavor. This requires the counterfactual mapping to be faithful (precise inverse of disfavoring factors), but no perturbation symmetry tests, model-internal access, or external validation (human/automated) are described to support this for non-linear models where interactions are not additive.
[Experiments] Experiments section: effectiveness is asserted for DRMM, ColBERT, and MonoT5, but without reported metrics on faithfulness, comparison to baselines for explanation quality, or controls for whether the added terms simply describe a different high-scoring document rather than explaining the original ranking, the results do not establish the framework's validity.

minor comments (1)

The abstract would be clearer with a one-sentence outline of the algorithmic procedure used to generate the counterfactual terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the scope of our claims and indicating where revisions will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim equates finding terms whose addition raises rank with identifying words responsible for original disfavor. This requires the counterfactual mapping to be faithful (precise inverse of disfavoring factors), but no perturbation symmetry tests, model-internal access, or external validation (human/automated) are described to support this for non-linear models where interactions are not additive.

Authors: We agree that the mapping from addition-based counterfactuals to explanations of original disfavor is not guaranteed to be a precise inverse for non-linear models. Our framework defines the explanation explicitly as the minimal terms whose absence caused the low rank (i.e., the terms that, when added, produce a measurable rank improvement). This is a directional counterfactual rather than a symmetric perturbation. We did not claim model-internal access or perform symmetry tests because the approach is model-agnostic and applies to black-box rankers. We will revise the abstract and introduction to state the claim more precisely as “terms whose addition improves rank, thereby highlighting factors contributing to the original disfavor” and add a limitations paragraph discussing the non-invertibility issue for non-linear models. revision: partial
Referee: [Experiments] Experiments section: effectiveness is asserted for DRMM, ColBERT, and MonoT5, but without reported metrics on faithfulness, comparison to baselines for explanation quality, or controls for whether the added terms simply describe a different high-scoring document rather than explaining the original ranking, the results do not establish the framework's validity.

Authors: The primary effectiveness metric reported is the achieved rank improvement after term addition, which directly measures whether the identified terms address the original low ranking. We will add explicit faithfulness metrics (e.g., rank delta before/after addition, comparison against random term addition and against terms drawn from top-ranked documents) in the revised experiments section. We will also include a control experiment that verifies the added terms are query-specific rather than generic high-scoring document descriptors. These additions will be reported for all models, including the neural ones. revision: yes

Circularity Check

0 steps flagged

No circularity: novel counterfactual framework with no self-referential derivations or fitted predictions

full rationale

The paper proposes a new counterfactual method to identify terms whose addition would improve a document's rank, framing this as an explanation for original disfavor. No equations, fitted parameters, or derivation chains are present in the abstract or described approach that reduce outputs to inputs by construction. The claim of being the 'first attempt' is a novelty assertion, not a load-bearing self-citation or self-definition. The central mapping from addition-based counterfactuals to explanations is an unvalidated modeling assumption (correctness issue) rather than a circular reduction. The work remains self-contained as a methodological proposal without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or background assumptions; ledger is empty.

pith-pipeline@v0.9.0 · 5780 in / 1013 out tokens · 25491 ms · 2026-05-23T20:46:03.265958+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey

work page 2022
[2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In InCoCo@NIPS

work page 2016
[3]

Alexander Bondarenko, Maik Fröbe, Jan Heinrich Reimer, Benno Stein, Michael Völske, and Matthias Hagen. 2022. Axiomatic Retrieval Experimentation with ir_axioms. In Proc. of SIGIR 2022 . 3131–3140

work page 2022
[4]

Miguel Á Carreira-Perpiñán and Suryabhan Singh Hada. 2021. Counterfactual explanations for oblique decision trees: Exact, efficient algorithms. InProceedings of the AAAI conference on artificial intelligence , Vol. 35. 6903–6911

work page 2021
[5]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). arXiv:2102.07662 https://arxiv.org/abs/2102.07662

work page arXiv 2021
[6]

Voorhees, and Ian Soboroff

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff. 2023. Overview of the TREC 2022 deep learning track. In Text REtrieval Conference (TREC). NIST, TREC

work page 2023
[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT

work page 2019
[8]

Gokhan Egri and Coskun Bayrak. 2014. The Role of Search Engine Optimization on Keeping the User on the Site. Procedia Computer Science 36 (2014), 335–

work page 2014
[9]

https://doi.org/10.1016/j.procs.2014.09.102 Complex Adaptive Systems Philadelphia, PA November 3-5, 2014

work page doi:10.1016/j.procs.2014.09.102 2014
[10]

Anett Erdmann, Ramón Arilla, and José M. Ponzoa. 2022. Search engine opti- mization: The long-term strategy of keyword choice. Journal of Business Research 144 (2022), 650–662. https://doi.org/10.1016/j.jbusres.2022.01.065

work page doi:10.1016/j.jbusres.2022.01.065 2022
[11]

Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT. https://doi.org/10.5281/zenodo.4461265

work page doi:10.5281/zenodo.4461265 2020
[12]

Bruce Croft

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. InProceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 55–64

work page 2016
[13]

Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 1297–1300

work page 2019
[14]

Faisal Hamman, Erfaun Noorani, Saumitra Mishra, Daniele Magazzeni, and Sang- hamitra Dutta. 2023. Robust counterfactual explanations for neural networks with probabilistic guarantees. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 499, 17 pages

work page 2023
[15]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). 2333–2338

work page 2013
[16]

Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, and Hiroki Arimura. 2021. Ordered counterfactual explanation by mixed-integer linear optimization. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11564–11574

work page 2021
[17]

Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. 2020. Model- agnostic counterfactual explanations for consequential decisions. InInternational Conference on Artificial Intelligence and Statistics . PMLR, 895–905

work page 2020
[18]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48

work page 2020
[19]

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Infor- mation Retrieval Research with Sparse and Dense Representations. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2356–2362

work page 2021
[20]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, and Xueqi Cheng. 2023. Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birm- ingham, United Kingdom) (CIKM ’23). Association f...

work page 2023
[21]

InPerson

Lijun Lyu and Avishek Anand. 2023. Listwise Explanations for Ranking Models Using Multiple Explainers. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part I (<conf-loc content-type="InPerson">Dublin, Ireland</conf- loc>). Springer-Verlag, Berlin, Heidelberg...

work page 2023
[22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS

work page 2013
[23]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 607–617

work page 2020
[24]

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Docu- ment Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020 , Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 708–718

work page 2020
[25]

Axel Parmentier and Thibaut Vidal. 2021. Optimal counterfactual explanations in tree ensembles. In International conference on machine learning . PMLR, 8422– 8431

work page 2021
[26]

Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, and Himabindu Lakkaraju. 2022. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In Interna- tional Conference on Artificial Intelligence and Statistics . PMLR, 4574–4594. Conference’17, July 2017, Washington, DC, USA Bhav...

work page 2022
[27]

Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Gustavo Penha, Eyal Krikon, and Vanessa Murdock. 2022. Pairwise review- based explanations for voice product search. In ACM SIGIR Conference on Human Information Interaction and Retrieval . 300–304

work page 2022
[29]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProc.of SIGKDD 2016. 1135–1144

work page 2016
[30]

Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). 770–773

work page 2019
[31]

Jaspreet Singh and Avishek Anand. 2020. Model agnostic interpretability of rankers via intent modelling. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 618–628

work page 2020
[32]

Arnaud Van Looveren and Janis Klaise. 2021. Interpretable counterfactual expla- nations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Springer, 650–665

work page 2021
[33]

Ellen Voorhees. 2005. Overview of the TREC 2004 Robust Retrieval Track. https: //doi.org/10.6028/NIST.SP.500-261

work page doi:10.6028/nist.sp.500-261 2005
[34]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2022. PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models. ArXiv preprint abs/2204.01321 (2022). https://arxiv.org/abs/ 2204.01321

work page arXiv 2022
[35]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng. 2022. Are Neural Ranking Models Robust? ACM Trans. Inf. Syst. 41, 2, Article 29 (dec 2022), 36 pages

work page 2022
[36]

Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, and Alex Jaimes. 2024. Counterfactual Editing for Search Result Explanation. arXiv:2301.10389 [cs.IR]

work page arXiv 2024
[37]

Puxuan Yu, Razieh Rahimi, and James Allan. 2022. Towards explainable search results: a listwise explanation generator. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 669–680. 7 APPENDIX 7.1 Retrieval Performance of IR Models We use Lin et al. [18] toolkit for implementing BM25 and MonoT...

work page 2022

[1] [1]

Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey

work page 2022

[2] [2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In InCoCo@NIPS

work page 2016

[3] [3]

Alexander Bondarenko, Maik Fröbe, Jan Heinrich Reimer, Benno Stein, Michael Völske, and Matthias Hagen. 2022. Axiomatic Retrieval Experimentation with ir_axioms. In Proc. of SIGIR 2022 . 3131–3140

work page 2022

[4] [4]

Miguel Á Carreira-Perpiñán and Suryabhan Singh Hada. 2021. Counterfactual explanations for oblique decision trees: Exact, efficient algorithms. InProceedings of the AAAI conference on artificial intelligence , Vol. 35. 6903–6911

work page 2021

[5] [5]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). arXiv:2102.07662 https://arxiv.org/abs/2102.07662

work page arXiv 2021

[6] [6]

Voorhees, and Ian Soboroff

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff. 2023. Overview of the TREC 2022 deep learning track. In Text REtrieval Conference (TREC). NIST, TREC

work page 2023

[7] [7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT

work page 2019

[8] [8]

Gokhan Egri and Coskun Bayrak. 2014. The Role of Search Engine Optimization on Keeping the User on the Site. Procedia Computer Science 36 (2014), 335–

work page 2014

[9] [9]

https://doi.org/10.1016/j.procs.2014.09.102 Complex Adaptive Systems Philadelphia, PA November 3-5, 2014

work page doi:10.1016/j.procs.2014.09.102 2014

[10] [10]

Anett Erdmann, Ramón Arilla, and José M. Ponzoa. 2022. Search engine opti- mization: The long-term strategy of keyword choice. Journal of Business Research 144 (2022), 650–662. https://doi.org/10.1016/j.jbusres.2022.01.065

work page doi:10.1016/j.jbusres.2022.01.065 2022

[11] [11]

Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT. https://doi.org/10.5281/zenodo.4461265

work page doi:10.5281/zenodo.4461265 2020

[12] [12]

Bruce Croft

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. InProceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 55–64

work page 2016

[13] [13]

Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 1297–1300

work page 2019

[14] [14]

Faisal Hamman, Erfaun Noorani, Saumitra Mishra, Daniele Magazzeni, and Sang- hamitra Dutta. 2023. Robust counterfactual explanations for neural networks with probabilistic guarantees. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 499, 17 pages

work page 2023

[15] [15]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). 2333–2338

work page 2013

[16] [16]

Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, and Hiroki Arimura. 2021. Ordered counterfactual explanation by mixed-integer linear optimization. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11564–11574

work page 2021

[17] [17]

Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. 2020. Model- agnostic counterfactual explanations for consequential decisions. InInternational Conference on Artificial Intelligence and Statistics . PMLR, 895–905

work page 2020

[18] [18]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48

work page 2020

[19] [19]

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Infor- mation Retrieval Research with Sparse and Dense Representations. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2356–2362

work page 2021

[20] [20]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, and Xueqi Cheng. 2023. Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birm- ingham, United Kingdom) (CIKM ’23). Association f...

work page 2023

[21] [21]

InPerson

Lijun Lyu and Avishek Anand. 2023. Listwise Explanations for Ranking Models Using Multiple Explainers. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part I (<conf-loc content-type="InPerson">Dublin, Ireland</conf- loc>). Springer-Verlag, Berlin, Heidelberg...

work page 2023

[22] [22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS

work page 2013

[23] [23]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 607–617

work page 2020

[24] [24]

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Docu- ment Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020 , Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 708–718

work page 2020

[25] [25]

Axel Parmentier and Thibaut Vidal. 2021. Optimal counterfactual explanations in tree ensembles. In International conference on machine learning . PMLR, 8422– 8431

work page 2021

[26] [26]

Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, and Himabindu Lakkaraju. 2022. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In Interna- tional Conference on Artificial Intelligence and Statistics . PMLR, 4574–4594. Conference’17, July 2017, Washington, DC, USA Bhav...

work page 2022

[27] [27]

Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Gustavo Penha, Eyal Krikon, and Vanessa Murdock. 2022. Pairwise review- based explanations for voice product search. In ACM SIGIR Conference on Human Information Interaction and Retrieval . 300–304

work page 2022

[29] [29]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProc.of SIGKDD 2016. 1135–1144

work page 2016

[30] [30]

Jaspreet Singh and Avishek Anand. 2019. EXS: Explainable Search Using Local Model Agnostic Interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). 770–773

work page 2019

[31] [31]

Jaspreet Singh and Avishek Anand. 2020. Model agnostic interpretability of rankers via intent modelling. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 618–628

work page 2020

[32] [32]

Arnaud Van Looveren and Janis Klaise. 2021. Interpretable counterfactual expla- nations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Springer, 650–665

work page 2021

[33] [33]

Ellen Voorhees. 2005. Overview of the TREC 2004 Robust Retrieval Track. https: //doi.org/10.6028/NIST.SP.500-261

work page doi:10.6028/nist.sp.500-261 2005

[34] [34]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2022. PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models. ArXiv preprint abs/2204.01321 (2022). https://arxiv.org/abs/ 2204.01321

work page arXiv 2022

[35] [35]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng. 2022. Are Neural Ranking Models Robust? ACM Trans. Inf. Syst. 41, 2, Article 29 (dec 2022), 36 pages

work page 2022

[36] [36]

Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, and Alex Jaimes. 2024. Counterfactual Editing for Search Result Explanation. arXiv:2301.10389 [cs.IR]

work page arXiv 2024

[37] [37]

Puxuan Yu, Razieh Rahimi, and James Allan. 2022. Towards explainable search results: a listwise explanation generator. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 669–680. 7 APPENDIX 7.1 Retrieval Performance of IR Models We use Lin et al. [18] toolkit for implementing BM25 and MonoT...

work page 2022