pith. sign in

arxiv: 1906.09293 · v1 · pith:LL7NLMOMnew · submitted 2019-06-21 · 💻 cs.LG · cs.AI· stat.ML

Generating Counterfactual and Contrastive Explanations using SHAP

Pith reviewed 2026-05-25 18:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords explainable AISHAPcounterfactual explanationscontrastive explanationsmodel agnosticinterpretable machine learningGDPR
0
0 comments X

The pith

SHAP feature attributions can be transformed into contrastive explanations and then into counterfactual datapoints for any model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a model-agnostic method that uses SHAP to generate contrastive explanations and then derives counterfactual datapoints from them. A generative pipeline converts the additive feature attributions first into statements that highlight decision contrasts and then into new instances that would flip the model's output. This approach requires no access to the model's internal structure or parameters. The pipeline is applied and illustrated on the IRIS, Wine Quality, and Mobile Features datasets. The resulting explanations address needs for human-understandable accounts of automated decisions.

Core claim

A generative pipeline based on SHAP produces both contrastive explanations and valid counterfactual instances in a model-agnostic manner, as demonstrated on the IRIS, Wine Quality, and Mobile Features datasets.

What carries the argument

The generative pipeline that converts SHAP additive feature attributions first into contrastive statements and then into counterfactual datapoints.

If this is right

  • Contrastive explanations can be produced directly from SHAP values for any black-box model.
  • Counterfactual datapoints can be derived from those contrastive statements without model-specific tuning.
  • The same pipeline applies across different datasets including IRIS and Wine Quality.
  • Explanations become available that meet requirements for human-understandable decision accounts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be combined with other attribution techniques to cross-check generated counterfactuals.
  • Testing on high-stakes domains would reveal whether the produced instances remain realistic.
  • Integration into automated compliance tools might follow if the counterfactuals prove stable across repeated runs.

Load-bearing premise

SHAP feature attributions can be directly turned into contrastive statements and valid counterfactual instances without extra model calibration or human checks.

What would settle it

Feeding the generated counterfactual datapoints back into the original model and verifying whether the prediction actually changes to the expected class.

read the original abstract

With the advent of GDPR, the domain of explainable AI and model interpretability has gained added impetus. Methods to extract and communicate visibility into decision-making models have become legal requirement. Two specific types of explanations, contrastive and counterfactual have been identified as suitable for human understanding. In this paper, we propose a model agnostic method and its systemic implementation to generate these explanations using shapely additive explanations (SHAP). We discuss a generative pipeline to create contrastive explanations and use it to further to generate counterfactual datapoints. This pipeline is tested and discussed on the IRIS, Wine Quality & Mobile Features dataset. Analysis of the results obtained follows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a model-agnostic generative pipeline that converts SHAP feature attributions into contrastive explanations and then into counterfactual data points. The pipeline is implemented and tested on the IRIS, Wine Quality, and Mobile Features datasets, with the goal of producing human-understandable explanations compliant with requirements such as those in GDPR.

Significance. If the pipeline produces counterfactual instances that are minimal, realistic, and guaranteed to flip the original black-box prediction, the work would offer a practical, SHAP-based route to contrastive and counterfactual explanations. The model-agnostic framing and use of an established attribution method are strengths, but the abstract and described approach provide no quantitative evidence on validity, minimality, or fidelity, limiting the assessed contribution.

major comments (2)
  1. [Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.
  2. [Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.
minor comments (2)
  1. [Abstract] Dataset names are inconsistently capitalized ('IRIS' vs. standard 'Iris'); use conventional nomenclature throughout.
  2. [Abstract] The phrase 'shapely additive explanations' should be corrected to 'Shapley additive explanations' and the acronym SHAP should be defined on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the manuscript. The comments correctly identify areas where the presentation of the pipeline and its evaluation can be strengthened. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract (pipeline description)] The central claim rests on a direct mapping from SHAP attributions to contrastive statements and then to counterfactual instances. No description is given of any re-query step on the original model to confirm that the generated points actually change the prediction; this is load-bearing because SHAP supplies local additive attributions that do not automatically guarantee a flip when features are altered, especially under interactions or non-monotonicity.

    Authors: We agree that confirming the prediction flip via re-query is essential and that the abstract (and methods) should explicitly describe this step. The current manuscript does not include such a verification procedure. We will revise the paper to add a description of re-querying the black-box model after feature modification to verify that the counterfactual instance changes the prediction. This will be incorporated into the abstract and the pipeline description. revision: yes

  2. Referee: [Abstract (evaluation statement)] The abstract states that the pipeline is 'tested and discussed' on three datasets but reports no metrics (e.g., success rate of prediction flips, distance to original instance, or comparison to baselines such as Wachter et al.). Without such validation, the claim that the generated counterfactuals are valid cannot be assessed.

    Authors: We agree that quantitative metrics are required to substantiate the validity, minimality, and fidelity of the generated counterfactuals. The manuscript currently offers only qualitative discussion of results on the IRIS, Wine Quality, and Mobile Features datasets. We will revise the manuscript to include quantitative metrics such as prediction-flip success rate, average distance to the original instance, and comparisons against baselines including Wachter et al. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline is a constructive implementation, not a self-referential derivation

full rationale

The paper presents a practical, model-agnostic generative pipeline that converts SHAP attributions into contrastive statements and then into candidate counterfactual points, tested on three datasets. No equations, fitting procedures, or first-principles derivations are described that reduce outputs to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no parameters are fitted on a subset then relabeled as predictions. The central claim is an engineering method whose validity rests on empirical testing rather than tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5627 in / 1015 out tokens · 21531 ms · 2026-05-25T18:43:59.630957+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 4 internal anchors

  1. [1]

    Can we open the black box of ai? Nature News, 538(7623):20,

    [Castelvecchi, 2016] Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20,

  2. [2]

    Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for

    [Edwards and Veale, 2017] Lilian Edwards and Michael Veale. Slave to the algorithm: Why a right to an explana- tion is probably not the remedy you are looking for. Duke L. & Tech. Rev., 16:18,

  3. [3]

    Explainable artificial in- telligence (xai)

    [Gunning, 2017] David Gunning. Explainable artificial in- telligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web,

  4. [4]

    Generating Counterfactual Explanations with Natural Language

    [Hendricks et al., 2018] Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Generating counterfactual explanations with natural language. arXiv preprint arXiv:1806.09809,

  5. [5]

    Contrastive explanation

    [Lipton, 1990] Peter Lipton. Contrastive explanation. Royal Institute of Philosophy Supplements, 27:247–266,

  6. [6]

    The Mythos of Model Interpretability

    [Lipton, 2016] Zachary C Lipton. The mythos of model in- terpretability. arXiv preprint arXiv:1606.03490,

  7. [7]

    A unified approach to interpreting model predictions

    [Lundberg and Lee, 2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems , pages 4765–4774,

  8. [8]

    Explanation in artificial intelli- gence: Insights from the social sciences

    [Miller, 2018] Tim Miller. Explanation in artificial intelli- gence: Insights from the social sciences. Artificial Intelli- gence,

  9. [9]

    Explaining explanations in ai

    [Mittelstadt et al., 2019] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288. ACM,

  10. [10]

    Interpretable machine learning

    [Molnar, 2018] Christoph Molnar. Interpretable machine learning. A Guide for Making Black Box Models Explain- able,

  11. [11]

    Eso- 5w1h framework: Ontological model for sitl paradigm

    [Rathi and Alam, ] Shubham Rathi and Aniket Alam. Eso- 5w1h framework: Ontological model for sitl paradigm. [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explain- ing the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowl- edge discovery and data m...

  12. [12]

    Contrastive explanation for ma- chine learning

    [Robeer, 2018] MJ Robeer. Contrastive explanation for ma- chine learning. Master’s thesis,

  13. [13]

    Explaining explanation

    [Ruben, 2015] David-Hillel Ruben. Explaining explanation. Routledge,

  14. [14]

    Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements

    [Sokol and Flach, 2018] Kacper Sokol and Peter A Flach. Conversational explanations of machine learning predic- tions through class-contrastive counterfactual statements. In IJCAI, pages 5785–5786,

  15. [15]

    Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

    [Van Bouwel and Weber, 2002] Jeroen Van Bouwel and Erik Weber. Remote causes, bad explanations? Journal for the Theory of Social Behaviour, 32(4):437–449,

  16. [16]

    Contrastive Explanations with Local Foil Trees

    [van der Waa et al., 2018a] Jasper van der Waa, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. Contrastive explanations with local foil trees. arXiv preprint arXiv:1806.07470,

  17. [17]

    Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

    [van der Waa et al., 2018b] Jasper van der Waa, Jurriaan van Diggelen, Karel van den Bosch, and Mark Neerincx. Con- trastive explanations for reinforcement learning in terms of expected consequences. arXiv preprint arXiv:1807.08706,

  18. [18]

    Counterfactual explanations without opening the black box: Automated decisions and the gdpr

    [Wachter et al., 2017] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017