Generating Counterfactual and Contrastive Explanations using SHAP

Shubham Rathi

arxiv: 1906.09293 · v1 · pith:LL7NLMOMnew · submitted 2019-06-21 · 💻 cs.LG · cs.AI· stat.ML

Generating Counterfactual and Contrastive Explanations using SHAP

Shubham Rathi This is my paper

Pith reviewed 2026-05-25 18:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords explainable AISHAPcounterfactual explanationscontrastive explanationsmodel agnosticinterpretable machine learningGDPR

0 comments

The pith

SHAP feature attributions can be transformed into contrastive explanations and then into counterfactual datapoints for any model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a model-agnostic method that uses SHAP to generate contrastive explanations and then derives counterfactual datapoints from them. A generative pipeline converts the additive feature attributions first into statements that highlight decision contrasts and then into new instances that would flip the model's output. This approach requires no access to the model's internal structure or parameters. The pipeline is applied and illustrated on the IRIS, Wine Quality, and Mobile Features datasets. The resulting explanations address needs for human-understandable accounts of automated decisions.

Core claim

A generative pipeline based on SHAP produces both contrastive explanations and valid counterfactual instances in a model-agnostic manner, as demonstrated on the IRIS, Wine Quality, and Mobile Features datasets.

What carries the argument

The generative pipeline that converts SHAP additive feature attributions first into contrastive statements and then into counterfactual datapoints.

If this is right

Contrastive explanations can be produced directly from SHAP values for any black-box model.
Counterfactual datapoints can be derived from those contrastive statements without model-specific tuning.
The same pipeline applies across different datasets including IRIS and Wine Quality.
Explanations become available that meet requirements for human-understandable decision accounts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be combined with other attribution techniques to cross-check generated counterfactuals.
Testing on high-stakes domains would reveal whether the produced instances remain realistic.
Integration into automated compliance tools might follow if the counterfactuals prove stable across repeated runs.

Load-bearing premise

SHAP feature attributions can be directly turned into contrastive statements and valid counterfactual instances without extra model calibration or human checks.

What would settle it

Feeding the generated counterfactual datapoints back into the original model and verifying whether the prediction actually changes to the expected class.

read the original abstract

With the advent of GDPR, the domain of explainable AI and model interpretability has gained added impetus. Methods to extract and communicate visibility into decision-making models have become legal requirement. Two specific types of explanations, contrastive and counterfactual have been identified as suitable for human understanding. In this paper, we propose a model agnostic method and its systemic implementation to generate these explanations using shapely additive explanations (SHAP). We discuss a generative pipeline to create contrastive explanations and use it to further to generate counterfactual datapoints. This pipeline is tested and discussed on the IRIS, Wine Quality & Mobile Features dataset. Analysis of the results obtained follows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies SHAP to generate contrastive and counterfactual explanations but provides no check that the new points actually flip the model's output.

read the letter

The main takeaway is that this paper outlines a pipeline to turn SHAP attributions into contrastive statements and then into counterfactual datapoints, tested on IRIS, Wine Quality, and Mobile Features. It presents the steps as model-agnostic and straightforward to implement. That is the extent of what is new: a practical mapping from existing SHAP values to these explanation types, with some discussion of results on the three datasets. The work is clear on the high-level flow and shows the authors have thought through how to operationalize it for small tabular cases. Credit is due for making the pipeline explicit rather than leaving it at the level of an idea. The datasets chosen are standard for this kind of demonstration and the description stays within what SHAP can supply. The soft spot is exactly the one flagged in the stress test. SHAP gives local additive contributions, yet the paper does not describe re-querying the original model on the generated points to confirm a prediction change, nor any calibration when interactions or non-monotonic behavior are present. Without that step the counterfactuals remain unverified. Evaluation appears to stop at generation and qualitative discussion rather than reporting flip rates, distance metrics, or realism checks. This limits how much weight the results can carry. The paper is aimed at practitioners who already use SHAP and want a quick route to contrastive or counterfactual output on similar data. Readers seeking formal guarantees or comparisons against dedicated counterfactual methods will not find them here. I would not bring it to a reading group and would not cite it. It does not look ready for peer review because the central mapping lacks the necessary verification step.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a model-agnostic generative pipeline that converts SHAP feature attributions into contrastive explanations and then into counterfactual data points. The pipeline is implemented and tested on the IRIS, Wine Quality, and Mobile Features datasets, with the goal of producing human-understandable explanations compliant with requirements such as those in GDPR.

Significance. If the pipeline produces counterfactual instances that are minimal, realistic, and guaranteed to flip the original black-box prediction, the work would offer a practical, SHAP-based route to contrastive and counterfactual explanations. The model-agnostic framing and use of an established attribution method are strengths, but the abstract and described approach provide no quantitative evidence on validity, minimality, or fidelity, limiting the assessed contribution.

major comments (2)

[Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.
[Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.

minor comments (2)

[Abstract] Dataset names are inconsistently capitalized ('IRIS' vs. standard 'Iris'); use conventional nomenclature throughout.
[Abstract] The phrase 'shapely additive explanations' should be corrected to 'Shapley additive explanations' and the acronym SHAP should be defined on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the manuscript. The comments correctly identify areas where the presentation of the pipeline and its evaluation can be strengthened. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.

Authors: We agree that confirming the prediction flip via re-query is essential and that the abstract (and methods) should explicitly describe this step. The current manuscript does not include such a verification procedure. We will revise the paper to add a description of re-querying the black-box model after feature modification to verify that the counterfactual instance changes the prediction. This will be incorporated into the abstract and the pipeline description. revision: yes
Referee: [Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.

Authors: We agree that quantitative metrics are required to substantiate the validity, minimality, and fidelity of the generated counterfactuals. The manuscript currently offers only qualitative discussion of results on the IRIS, Wine Quality, and Mobile Features datasets. We will revise the manuscript to include quantitative metrics such as prediction-flip success rate, average distance to the original instance, and comparisons against baselines including Wachter et al. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline is a constructive implementation, not a self-referential derivation

full rationale

The paper presents a practical, model-agnostic generative pipeline that converts SHAP attributions into contrastive statements and then into candidate counterfactual points, tested on three datasets. No equations, fitting procedures, or first-principles derivations are described that reduce outputs to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no parameters are fitted on a subset then relabeled as predictions. The central claim is an engineering method whose validity rests on empirical testing rather than tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5627 in / 1015 out tokens · 21531 ms · 2026-05-25T18:43:59.630957+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 4 internal anchors

[1]

Can we open the black box of ai? Nature News, 538(7623):20,

[Castelvecchi, 2016] Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20,

work page 2016
[2]

Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for

[Edwards and Veale, 2017] Lilian Edwards and Michael Veale. Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for. Duke L. & Tech. Rev., 16:18,

work page 2017
[3]

Explainable artiﬁcial in- telligence (xai)

[Gunning, 2017] David Gunning. Explainable artiﬁcial in- telligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web,

work page 2017
[4]

Generating Counterfactual Explanations with Natural Language

[Hendricks et al., 2018] Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Generating counterfactual explanations with natural language. arXiv preprint arXiv:1806.09809,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Contrastive explanation

[Lipton, 1990] Peter Lipton. Contrastive explanation. Royal Institute of Philosophy Supplements, 27:247–266,

work page 1990
[6]

The Mythos of Model Interpretability

[Lipton, 2016] Zachary C Lipton. The mythos of model in- terpretability. arXiv preprint arXiv:1606.03490,

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

A uniﬁed approach to interpreting model predictions

[Lundberg and Lee, 2017] Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. In Advances in Neural Information Processing Systems , pages 4765–4774,

work page 2017
[8]

Explanation in artiﬁcial intelli- gence: Insights from the social sciences

[Miller, 2018] Tim Miller. Explanation in artiﬁcial intelli- gence: Insights from the social sciences. Artiﬁcial Intelli- gence,

work page 2018
[9]

Explaining explanations in ai

[Mittelstadt et al., 2019] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288. ACM,

work page 2019
[10]

Interpretable machine learning

[Molnar, 2018] Christoph Molnar. Interpretable machine learning. A Guide for Making Black Box Models Explain- able,

work page 2018
[11]

Eso- 5w1h framework: Ontological model for sitl paradigm

[Rathi and Alam, ] Shubham Rathi and Aniket Alam. Eso- 5w1h framework: Ontological model for sitl paradigm. [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explain- ing the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD international conference on knowl- edge discovery and data m...

work page 2016
[12]

Contrastive explanation for ma- chine learning

[Robeer, 2018] MJ Robeer. Contrastive explanation for ma- chine learning. Master’s thesis,

work page 2018
[13]

Explaining explanation

[Ruben, 2015] David-Hillel Ruben. Explaining explanation. Routledge,

work page 2015
[14]

Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements

[Sokol and Flach, 2018] Kacper Sokol and Peter A Flach. Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements. In IJCAI, pages 5785–5786,

work page 2018
[15]

Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

[Van Bouwel and Weber, 2002] Jeroen Van Bouwel and Erik Weber. Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

work page 2002
[16]

Contrastive Explanations with Local Foil Trees

[van der Waa et al., 2018a] Jasper van der Waa, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. Contrastive explanations with local foil trees. arXiv preprint arXiv:1806.07470,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

[van der Waa et al., 2018b] Jasper van der Waa, Jurriaan van Diggelen, Karel van den Bosch, and Mark Neerincx. Con- trastive explanations for reinforcement learning in terms of expected consequences. arXiv preprint arXiv:1807.08706,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr

[Wachter et al., 2017] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017

work page 2017

[1] [1]

Can we open the black box of ai? Nature News, 538(7623):20,

[Castelvecchi, 2016] Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20,

work page 2016

[2] [2]

Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for

[Edwards and Veale, 2017] Lilian Edwards and Michael Veale. Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for. Duke L. & Tech. Rev., 16:18,

work page 2017

[3] [3]

Explainable artiﬁcial in- telligence (xai)

[Gunning, 2017] David Gunning. Explainable artiﬁcial in- telligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web,

work page 2017

[4] [4]

Generating Counterfactual Explanations with Natural Language

[Hendricks et al., 2018] Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Generating counterfactual explanations with natural language. arXiv preprint arXiv:1806.09809,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Contrastive explanation

[Lipton, 1990] Peter Lipton. Contrastive explanation. Royal Institute of Philosophy Supplements, 27:247–266,

work page 1990

[6] [6]

The Mythos of Model Interpretability

[Lipton, 2016] Zachary C Lipton. The mythos of model in- terpretability. arXiv preprint arXiv:1606.03490,

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

A uniﬁed approach to interpreting model predictions

[Lundberg and Lee, 2017] Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. In Advances in Neural Information Processing Systems , pages 4765–4774,

work page 2017

[8] [8]

Explanation in artiﬁcial intelli- gence: Insights from the social sciences

[Miller, 2018] Tim Miller. Explanation in artiﬁcial intelli- gence: Insights from the social sciences. Artiﬁcial Intelli- gence,

work page 2018

[9] [9]

Explaining explanations in ai

[Mittelstadt et al., 2019] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288. ACM,

work page 2019

[10] [10]

Interpretable machine learning

[Molnar, 2018] Christoph Molnar. Interpretable machine learning. A Guide for Making Black Box Models Explain- able,

work page 2018

[11] [11]

Eso- 5w1h framework: Ontological model for sitl paradigm

[Rathi and Alam, ] Shubham Rathi and Aniket Alam. Eso- 5w1h framework: Ontological model for sitl paradigm. [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explain- ing the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD international conference on knowl- edge discovery and data m...

work page 2016

[12] [12]

Contrastive explanation for ma- chine learning

[Robeer, 2018] MJ Robeer. Contrastive explanation for ma- chine learning. Master’s thesis,

work page 2018

[13] [13]

Explaining explanation

[Ruben, 2015] David-Hillel Ruben. Explaining explanation. Routledge,

work page 2015

[14] [14]

Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements

[Sokol and Flach, 2018] Kacper Sokol and Peter A Flach. Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements. In IJCAI, pages 5785–5786,

work page 2018

[15] [15]

Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

[Van Bouwel and Weber, 2002] Jeroen Van Bouwel and Erik Weber. Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

work page 2002

[16] [16]

Contrastive Explanations with Local Foil Trees

[van der Waa et al., 2018a] Jasper van der Waa, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. Contrastive explanations with local foil trees. arXiv preprint arXiv:1806.07470,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

[van der Waa et al., 2018b] Jasper van der Waa, Jurriaan van Diggelen, Karel van den Bosch, and Mark Neerincx. Con- trastive explanations for reinforcement learning in terms of expected consequences. arXiv preprint arXiv:1807.08706,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr

[Wachter et al., 2017] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017

work page 2017