Generating Counterfactual and Contrastive Explanations using SHAP
Pith reviewed 2026-05-25 18:43 UTC · model grok-4.3
The pith
SHAP feature attributions can be transformed into contrastive explanations and then into counterfactual datapoints for any model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A generative pipeline based on SHAP produces both contrastive explanations and valid counterfactual instances in a model-agnostic manner, as demonstrated on the IRIS, Wine Quality, and Mobile Features datasets.
What carries the argument
The generative pipeline that converts SHAP additive feature attributions first into contrastive statements and then into counterfactual datapoints.
If this is right
- Contrastive explanations can be produced directly from SHAP values for any black-box model.
- Counterfactual datapoints can be derived from those contrastive statements without model-specific tuning.
- The same pipeline applies across different datasets including IRIS and Wine Quality.
- Explanations become available that meet requirements for human-understandable decision accounts.
Where Pith is reading between the lines
- The method could be combined with other attribution techniques to cross-check generated counterfactuals.
- Testing on high-stakes domains would reveal whether the produced instances remain realistic.
- Integration into automated compliance tools might follow if the counterfactuals prove stable across repeated runs.
Load-bearing premise
SHAP feature attributions can be directly turned into contrastive statements and valid counterfactual instances without extra model calibration or human checks.
What would settle it
Feeding the generated counterfactual datapoints back into the original model and verifying whether the prediction actually changes to the expected class.
read the original abstract
With the advent of GDPR, the domain of explainable AI and model interpretability has gained added impetus. Methods to extract and communicate visibility into decision-making models have become legal requirement. Two specific types of explanations, contrastive and counterfactual have been identified as suitable for human understanding. In this paper, we propose a model agnostic method and its systemic implementation to generate these explanations using shapely additive explanations (SHAP). We discuss a generative pipeline to create contrastive explanations and use it to further to generate counterfactual datapoints. This pipeline is tested and discussed on the IRIS, Wine Quality & Mobile Features dataset. Analysis of the results obtained follows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a model-agnostic generative pipeline that converts SHAP feature attributions into contrastive explanations and then into counterfactual data points. The pipeline is implemented and tested on the IRIS, Wine Quality, and Mobile Features datasets, with the goal of producing human-understandable explanations compliant with requirements such as those in GDPR.
Significance. If the pipeline produces counterfactual instances that are minimal, realistic, and guaranteed to flip the original black-box prediction, the work would offer a practical, SHAP-based route to contrastive and counterfactual explanations. The model-agnostic framing and use of an established attribution method are strengths, but the abstract and described approach provide no quantitative evidence on validity, minimality, or fidelity, limiting the assessed contribution.
major comments (2)
- [Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.
- [Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.
minor comments (2)
- [Abstract] Dataset names are inconsistently capitalized ('IRIS' vs. standard 'Iris'); use conventional nomenclature throughout.
- [Abstract] The phrase 'shapely additive explanations' should be corrected to 'Shapley additive explanations' and the acronym SHAP should be defined on first use.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the manuscript. The comments correctly identify areas where the presentation of the pipeline and its evaluation can be strengthened. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.
Authors: We agree that confirming the prediction flip via re-query is essential and that the abstract (and methods) should explicitly describe this step. The current manuscript does not include such a verification procedure. We will revise the paper to add a description of re-querying the black-box model after feature modification to verify that the counterfactual instance changes the prediction. This will be incorporated into the abstract and the pipeline description. revision: yes
-
Referee: [Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.
Authors: We agree that quantitative metrics are required to substantiate the validity, minimality, and fidelity of the generated counterfactuals. The manuscript currently offers only qualitative discussion of results on the IRIS, Wine Quality, and Mobile Features datasets. We will revise the manuscript to include quantitative metrics such as prediction-flip success rate, average distance to the original instance, and comparisons against baselines including Wachter et al. revision: yes
Circularity Check
No circularity: pipeline is a constructive implementation, not a self-referential derivation
full rationale
The paper presents a practical, model-agnostic generative pipeline that converts SHAP attributions into contrastive statements and then into candidate counterfactual points, tested on three datasets. No equations, fitting procedures, or first-principles derivations are described that reduce outputs to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no parameters are fitted on a subset then relabeled as predictions. The central claim is an engineering method whose validity rests on empirical testing rather than tautological redefinition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Can we open the black box of ai? Nature News, 538(7623):20,
[Castelvecchi, 2016] Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20,
work page 2016
-
[2]
[Edwards and Veale, 2017] Lilian Edwards and Michael Veale. Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for. Duke L. & Tech. Rev., 16:18,
work page 2017
-
[3]
Explainable artificial in- telligence (xai)
[Gunning, 2017] David Gunning. Explainable artificial in- telligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web,
work page 2017
-
[4]
Generating Counterfactual Explanations with Natural Language
[Hendricks et al., 2018] Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Generating counterfactual explanations with natural language. arXiv preprint arXiv:1806.09809,
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
[Lipton, 1990] Peter Lipton. Contrastive explanation. Royal Institute of Philosophy Supplements, 27:247–266,
work page 1990
-
[6]
The Mythos of Model Interpretability
[Lipton, 2016] Zachary C Lipton. The mythos of model in- terpretability. arXiv preprint arXiv:1606.03490,
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
A unified approach to interpreting model predictions
[Lundberg and Lee, 2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems , pages 4765–4774,
work page 2017
-
[8]
Explanation in artificial intelli- gence: Insights from the social sciences
[Miller, 2018] Tim Miller. Explanation in artificial intelli- gence: Insights from the social sciences. Artificial Intelli- gence,
work page 2018
-
[9]
[Mittelstadt et al., 2019] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288. ACM,
work page 2019
-
[10]
Interpretable machine learning
[Molnar, 2018] Christoph Molnar. Interpretable machine learning. A Guide for Making Black Box Models Explain- able,
work page 2018
-
[11]
Eso- 5w1h framework: Ontological model for sitl paradigm
[Rathi and Alam, ] Shubham Rathi and Aniket Alam. Eso- 5w1h framework: Ontological model for sitl paradigm. [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explain- ing the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowl- edge discovery and data m...
work page 2016
-
[12]
Contrastive explanation for ma- chine learning
[Robeer, 2018] MJ Robeer. Contrastive explanation for ma- chine learning. Master’s thesis,
work page 2018
-
[13]
[Ruben, 2015] David-Hillel Ruben. Explaining explanation. Routledge,
work page 2015
-
[14]
[Sokol and Flach, 2018] Kacper Sokol and Peter A Flach. Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements. In IJCAI, pages 5785–5786,
work page 2018
-
[15]
Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,
[Van Bouwel and Weber, 2002] Jeroen Van Bouwel and Erik Weber. Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,
work page 2002
-
[16]
Contrastive Explanations with Local Foil Trees
[van der Waa et al., 2018a] Jasper van der Waa, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. Contrastive explanations with local foil trees. arXiv preprint arXiv:1806.07470,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences
[van der Waa et al., 2018b] Jasper van der Waa, Jurriaan van Diggelen, Karel van den Bosch, and Mark Neerincx. Con- trastive explanations for reinforcement learning in terms of expected consequences. arXiv preprint arXiv:1807.08706,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Counterfactual explanations without opening the black box: Automated decisions and the gdpr
[Wachter et al., 2017] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.