pith. sign in

arxiv: 2512.07178 · v1 · submitted 2025-12-08 · 💻 cs.AI · cs.HC· cs.LG

ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation

Pith reviewed 2026-05-17 01:29 UTC · model grok-4.3

classification 💻 cs.AI cs.HCcs.LG
keywords explainable AISHAPcontextual text generationlarge language modelsuser studieshealthcare AImodel interpretabilityPython package
0
0 comments X

The pith

ContextualSHAP pairs SHAP values with GPT-generated text to explain model predictions in plain language tailored to the user.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Python package that takes standard SHAP feature attributions and feeds them to GPT along with user-provided details such as feature aliases, descriptions, and background information. The goal is to produce accompanying sentences that place the numerical importance scores in a meaningful context for non-technical readers. In a healthcare case study the authors ran surveys and interviews showing that people found the text-augmented outputs clearer and more appropriate than SHAP plots alone. If the approach works, it would let end users in high-stakes settings understand why a model reached a particular conclusion without needing to interpret charts or code.

Core claim

Integrating SHAP with GPT under user-defined parameters for aliases, descriptions, and extra context produces textual explanations that users in a healthcare evaluation judged more understandable and contextually suitable than visual SHAP outputs by themselves.

What carries the argument

The ContextualSHAP package, which converts SHAP attributions into prompts for GPT and inserts user-supplied parameters to generate customized natural-language explanations that sit alongside the usual plots.

If this is right

  • Explanations become usable by doctors, patients, and other non-experts who must act on model outputs.
  • The same SHAP numbers can be reframed for different audiences simply by changing the user parameters.
  • Combining text with visuals may increase perceived trustworthiness of the model in regulated domains.
  • The package works with any model already supported by SHAP, so the textual layer can be added without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the generated text remains strictly descriptive of the SHAP values, it could reduce the chance that users infer causal links the numbers do not support.
  • The method could be extended to other post-hoc explanation techniques such as LIME or counterfactuals with only minor prompt changes.
  • Real-time parameter adjustment by the user might allow iterative refinement of the explanation until it matches their information needs.

Load-bearing premise

That the GPT text will stay faithful to the actual SHAP numbers and will not add misleading causal claims or omit important details that users might then accept as true.

What would settle it

A follow-up test in which users are asked to list the top three features driving a prediction after reading only the generated text; if their lists diverge systematically from the underlying SHAP rankings, the claim of improved understanding would be weakened.

Figures

Figures reproduced from arXiv: 2512.07178 by Hidetaka Nambo, Latifa Dwiyanti, Sergio Ryan Wibisono.

Figure 1
Figure 1. Figure 1: XAI Growing Papers emphasized in all studies is informativeness, the idea that explanations should provide users with sufficient information to support decision making [5]. Arun Rai beautifully captures the essence of XAI by likening it to transforming a ’black box’ into a ’glass box’, allowing users to understand the rationale behind AI predictions [6]. XAI has been applied in a wide range of domains, inc… view at source ↗
Figure 2
Figure 2. Figure 2: ContextualSHAP To provide richer contextual information and enhance the quality of the ChatGPT-generated explanations, the contextualSHAP functions are equipped with five optional parameters. These inputs serve as complementary additions to the SHAP-generated visualizations and help generated explanations more comprehensive, accurate, and user-relevant. (1) Feature renaming (feature_aliases) Feature names … view at source ↗
read the original abstract

Explainable Artificial Intelligence (XAI) has become an increasingly important area of research, particularly as machine learning models are deployed in high-stakes domains. Among various XAI approaches, SHAP (SHapley Additive exPlanations) has gained prominence due to its ability to provide both global and local explanations across different machine learning models. While SHAP effectively visualizes feature importance, it often lacks contextual explanations that are meaningful for end-users, especially those without technical backgrounds. To address this gap, we propose a Python package that extends SHAP by integrating it with a large language model (LLM), specifically OpenAI's GPT, to generate contextualized textual explanations. This integration is guided by user-defined parameters (such as feature aliases, descriptions, and additional background) to tailor the explanation to both the model context and the user perspective. We hypothesize that this enhancement can improve the perceived understandability of SHAP explanations. To evaluate the effectiveness of the proposed package, we applied it in a healthcare-related case study and conducted user evaluations involving real end-users. The results, based on Likert-scale surveys and follow-up interviews, indicate that the generated explanations were perceived as more understandable and contextually appropriate compared to visual-only outputs. While the findings are preliminary, they suggest that combining visualization with contextualized text may support more user-friendly and trustworthy model explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ContextualSHAP, a Python package that augments standard SHAP visualizations with GPT-generated natural-language explanations. User-supplied parameters (feature aliases, descriptions, background context) guide the prompting. The central claim is that the resulting hybrid outputs are perceived as more understandable and contextually appropriate than visual SHAP alone, supported by a healthcare case study that collected Likert-scale ratings and follow-up interviews from real end-users.

Significance. If the generated text can be shown to faithfully reflect SHAP values without introducing hallucinations or unsupported causal language, the approach could meaningfully improve accessibility of model explanations for non-technical stakeholders in regulated domains. The work supplies a practical, parameter-driven integration rather than a new theoretical derivation, so its primary contribution is engineering and user-facing utility rather than a methodological advance.

major comments (2)
  1. [Evaluation / healthcare case study] Evaluation section (healthcare case study): the manuscript states that Likert-scale surveys and interviews showed higher perceived understandability, yet reports neither sample size, response rate, statistical tests, control condition, nor inter-rater reliability. Without these quantities the central claim that ContextualSHAP improves explanations rests on unquantified subjective preference and cannot be evaluated for robustness.
  2. [Method / prompting and generation] Generation pipeline (parameter-guided prompting): no post-generation audit is described that checks whether the GPT outputs preserve the ranked feature importances from the underlying SHAP values, avoid omissions of top features, or refrain from causal language not licensed by the additive SHAP decomposition. Subjective preference can increase from fluent but inaccurate text, leaving the load-bearing assumption that the method enhances rather than merely decorates SHAP untested.
minor comments (2)
  1. [Abstract / Introduction] The abstract and introduction would benefit from an explicit statement of the package's public API and installation instructions so readers can reproduce the reported healthcare example.
  2. [Figures / Tables] Figure captions and table headings should clarify whether the displayed SHAP plots are the standard library output or modified by ContextualSHAP parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of evaluation rigor and methodological fidelity. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Evaluation / healthcare case study] Evaluation section (healthcare case study): the manuscript states that Likert-scale surveys and interviews showed higher perceived understandability, yet reports neither sample size, response rate, statistical tests, control condition, nor inter-rater reliability. Without these quantities the central claim that ContextualSHAP improves explanations rests on unquantified subjective preference and cannot be evaluated for robustness.

    Authors: We agree that the current manuscript provides insufficient detail on the user study design and results. The healthcare case study was intended as a preliminary evaluation with domain experts to gather initial feedback on perceived understandability. In the revised manuscript we will expand the evaluation section to report the sample size, response rate, explicit description of the control condition (standard SHAP visualizations), any statistical tests performed on the Likert ratings, and inter-rater reliability measures for the interview data. We will also add a dedicated limitations subsection that explicitly notes the preliminary character of the findings. revision: yes

  2. Referee: [Method / prompting and generation] Generation pipeline (parameter-guided prompting): no post-generation audit is described that checks whether the GPT outputs preserve the ranked feature importances from the underlying SHAP values, avoid omissions of top features, or refrain from causal language not licensed by the additive SHAP decomposition. Subjective preference can increase from fluent but inaccurate text, leaving the load-bearing assumption that the method enhances rather than merely decorates SHAP untested.

    Authors: We acknowledge the importance of verifying that generated text remains faithful to the SHAP decomposition. The prompting strategy incorporates ranked feature importances and user-supplied context to reduce the risk of major omissions or unsupported causal claims, yet the manuscript does not describe any systematic post-generation audit. In the revision we will add an explicit discussion of these risks, including examples of potential hallucinations or causal language, and we will provide user guidelines for manual verification against the original SHAP values. We will also outline an optional lightweight audit step that could be added to the package in a future release. This keeps the primary contribution focused on the practical integration while directly addressing the fidelity concern. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a Python package integrating SHAP with GPT for generating contextual textual explanations guided by user parameters, then evaluates perceived understandability via Likert surveys and interviews in a healthcare case study. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain. The central claim rests on external user feedback rather than any internal reduction to inputs by construction. The evaluation is independent of the package implementation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the untested premise that an off-the-shelf LLM can translate SHAP values into accurate, non-misleading natural language when supplied with domain context; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Large language models can produce coherent, contextually appropriate textual summaries of numeric feature attributions without systematic hallucination or bias.
    The method invokes this capability to justify the added text layer.

pith-pipeline@v0.9.0 · 5552 in / 1258 out tokens · 90298 ms · 2026-05-17T01:29:01.617048+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Interpretable Machine Learning

    Christoph Molnar. Interpretable Machine Learning. Christoph Molnar, 2022

  2. [2]

    Wang, and Mark Turek

    David Gunning, Eric Vorm, Jerry Y. Wang, and Mark Turek. Darpa’s explainable ai (xai) program: A retrospective. Applied AI Letters, 2(4), 2021

  3. [3]

    Kinney, Christos Anastasiades, Richard Authur, Iz Beltagy, Jonathan Bragg, Adam Buraczynski, et al

    Robert M. Kinney, Christos Anastasiades, Richard Authur, Iz Beltagy, Jonathan Bragg, Adam Buraczynski, et al. The semantic scholar open data platform, 2023

  4. [4]

    Peeking inside the black-box: A survey on explainable artificial intelligence (xai)

    Amina Adadi and Mohammed Berrada. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6, 2018

  5. [5]

    Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai

    Alejandro Barredo Arrieta et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion, 58, 2020

  6. [6]

    Explainable ai: from black box to glass box

    Arun Rai. Explainable ai: from black box to glass box. Journal of the Academy of Marketing Science , 48(1):137–141, 2020

  7. [7]

    G. P. Reddy and Y. V. P. Kumar. Explainable ai (xai): Explained. In IEEE Open Conference of Electrical, Electronic and Information Sciences (EStream) , 2023

  8. [8]

    Survey on explainable ai: From approaches, limitations and applications aspects

    Wei Yang et al. Survey on explainable ai: From approaches, limitations and applications aspects. Human-Centric Intelligent Systems, 3(3), 2023

  9. [9]

    Towards human-centered design of explainable artificial intelligence (xai): A survey of empirical studies, 2024

    Shuai Ma. Towards human-centered design of explainable artificial intelligence (xai): A survey of empirical studies, 2024

  10. [10]

    Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making

    Yunfeng Zhang et al. Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , 2020

  11. [11]

    Linardatos, V

    P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):1–45, 2021

  12. [12]

    A unified approach to interpreting model predictions, 2017

    Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017

  13. [13]

    Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M

    Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Rich Caruana, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1), 2020

  14. [14]

    Kalyan, Ajit Rajasekharan, and Sangeetha S

    Kartik S. Kalyan, Ajit Rajasekharan, and Sangeetha S. Ammus: A survey of transformer-based pretrained models in natural language processing, 2021

  15. [15]

    Brown et al

    Tom B. Brown et al. Language models are few-shot learners, 2020

  16. [16]

    Kartik S. Kalyan. A survey of gpt-3 family large language models including chatgpt and gpt-4, 2023

  17. [17]

    Training language models to follow instructions with human feedback, 2022

    Long Ouyang et al. Training language models to follow instructions with human feedback, 2022

  18. [18]

    A survey of methods for explaining black box models, 2018

    Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, and Fosca Giannotti. A survey of methods for explaining black box models, 2018

  19. [19]

    Interacting with predictions: Visual inspection of black-box machine learning models

    Josua Krause, Adam Perer, and Kenney Ng. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the CHI Conference on Human Factors in Computing Systems , pages 5686–5697, 2016

  20. [20]

    Explainable ai: Beware of inmates running the asylum

    Tim Miller, Paul Howe, and Leon Sterling. Explainable ai: Beware of inmates running the asylum. In IJCAI Workshop on Explainable Artificial Intelligence (XAI), 2017

  21. [21]

    Explanation in artificial intelligence: Insights from the social sciences

    Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 2019

  22. [22]

    The pragmatic turn in explainable artificial intelligence (xai)

    Andrés Páez. The pragmatic turn in explainable artificial intelligence (xai). Minds and Machines, 29(3):441–459, 2019

  23. [23]

    An overview of the empirical evaluation of explainable ai (xai): A comprehensive guideline for user-centered evaluation in xai

    Sana Naveed, Gregor Stevens, and David Robin-Kern. An overview of the empirical evaluation of explainable ai (xai): A comprehensive guideline for user-centered evaluation in xai. Applied Sciences, 14(23), 2024

  24. [24]

    Latifa Dwiyanti, Hidenori Nambo, and Nur Hamid. Leveraging explainable artificial intelligence (xai) for expert interpretability in predicting rapid kidney enlargement risks in autosomal dominant polycystic kidney disease (adpkd). AI, 5(4):2037–2065, 2024

  25. [25]

    Richard E. Mayer. Multimedia Learning. Cambridge University Press, 2002

  26. [26]

    Accessed: 2025-05-02

    Shap documentation. Accessed: 2025-05-02

  27. [27]

    Explainability fact sheets: A framework for systematic assessment of explainable approaches

    Kacper Sokol and Peter Flach. Explainability fact sheets: A framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , pages 56–67, 2020

  28. [28]

    Reading comprehension and its underlying components in second-language learners: A meta-analysis of studies comparing first- and second-language learners

    Monica Melby-Lervåg and Arne Lervåg. Reading comprehension and its underlying components in second-language learners: A meta-analysis of studies comparing first- and second-language learners. Psychological Bulletin, 140(2):409–433, 2014

  29. [29]

    The uci machine learning repository, 2025

    Markelle Kelly, Rachel Longjohn, and Kolby Nottingham. The uci machine learning repository, 2025. Accessed: 2025-05-12. 10