pith. machine review for the scientific record. sign in

arxiv: 2605.10533 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

ConfoundingSHAP: Quantifying confounding strength in causal inference

Dennis Frauen, Eyke H\"ullermeier, Maresa Schr\"oder, Marie Brockschmidt, Maximilian Muschalik, Santo M.A.R. Thies, Stefan Feuerriegel, Valentyn Melnychuk

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:19 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal inferenceconfoundingShapley valuesadjustment setsobservational dataexplainable AI
0
0 comments X

The pith

ConfoundingSHAP attributes confounding strength to individual covariates through a targeted Shapley game over adjustment sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to identify which observed covariates act as confounders when treatment assignment is unknown in observational data. It defines a custom Shapley game whose value function measures how much an adjustment set reduces bias from confounding, then attributes that value back to each covariate. This produces rankings of confounding strength that differ from standard SHAP uses focused on treatment-effect heterogeneity. A TabPFN-based estimator evaluates the many required adjustment sets without repeated model refits. The resulting attributions are shown to highlight driving covariates across several datasets.

Core claim

ConfoundingSHAP is a Shapley-based method for attributing confounding strength to individual covariates. We propose a Shapley game targeted to infer the confounding strength of the covariates. Our resulting Shapley values differ from the standard applications of SHAP explanations on causal targets, such as understanding treatment effect heterogeneity, which are ill-suited for our task. Second, as our task requires evaluating the value function over many adjustment sets, we provide a scalable TabPFN-based estimation that avoids exhaustive refitting. We demonstrate the practical value across various datasets, where ConfoundingSHAP provides informative explanations of which observed covariates

What carries the argument

The custom Shapley value game whose value function is defined on adjustment sets to isolate the confounding bias removed by each covariate.

If this is right

  • Analysts gain an explicit ranking of which covariates most influence the treatment-outcome dependence.
  • Adjustment decisions can be prioritized by the magnitude of attributed confounding strength.
  • The method supplies an explanation of the observed treatment assignment mechanism without assuming it is known.
  • Scalable evaluation makes the attributions feasible even when the number of covariates is moderately large.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same game structure could be adapted to quantify the strength of other bias sources such as selection bias.
  • Rankings from ConfoundingSHAP might serve as input features for automated confounder-selection procedures in high-dimensional settings.
  • In longitudinal data the method could be extended to time-varying confounders by defining value functions over time-indexed adjustment sets.

Load-bearing premise

The value function over adjustment sets isolates confounding strength rather than other forms of statistical dependence or model misspecification.

What would settle it

A controlled simulation with known true confounders and non-confounders in which the method assigns low strength to the true confounders or high strength to non-confounders would falsify the approach.

Figures

Figures reproduced from arXiv: 2605.10533 by Dennis Frauen, Eyke H\"ullermeier, Maresa Schr\"oder, Marie Brockschmidt, Maximilian Muschalik, Santo M.A.R. Thies, Stefan Feuerriegel, Valentyn Melnychuk.

Figure 1
Figure 1. Figure 1: Shown is the difference be￾tween variable importance obtained by applying SHAP to a CATE estimator [38] vs. of confounding strength (ours). In causal inference, treatment assignment is often not ran￾dom, so treated and untreated groups can differ system￾atically in their characteristics [37, 58, 64]. This gives rise to observed confounding, where variables influence both treatment assignment and the outcom… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our ConfoundingSHAP method. Therein, residual confounding bias is defined as the gap between observational and causal contrasts and is attributed to individual variates locally and globally. Several explanation methods have been developed for causal machine learning [56, 59], but with different target objects. Examples are methods for variable importance of propensity scores (to explain the tre… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of ConfoundingSHAP. Our method defines confounding bias under restricted adjustment ( A ), introduces a Shapley game over this bias functional ( B ) and then estimates coalition values with tabular foundation models ( C ). ν(S) := −E[bS(XS)] . (7) Why the negative sign? We define ν(S) = −E[bS(XS)] with a negative sign, so that, when comparing a smaller subset S to the larger subset S ∪ {j}, a posi… view at source ↗
Figure 5
Figure 5. Figure 5: Medium-dimensional covariate set￾ting. Synthetic 17-covariate dataset with five confounders. ⇒ The confounders have the largest Shapley values and are recovered in the top five ranks across seeds. not require retraining. TabPFN is used only to obtain the coalition-value estimates νˆ(S) efficiently ( 3 ). The confounding-bias game itself remains unchanged and is the one defined in Section 4. Estimating Shap… view at source ↗
Figure 6
Figure 6. Figure 6: ⇒ Global confounding strength dis￾appears under randomly assigned treatment [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: SUPPORT right heart catheterization study. Confounding strength aver￾aged over 10 runs. ⇒ Several covariates are clinically plau￾sible confounders. Aim. Finally, we demonstrate the clinical value of Confounding￾SHAP by using the SUPPORT right heart catheterization study [10], where the treatment is right-heart catheterization (RHC) within the first 24 hours in the intensive care unit (ICU). The outcome is … view at source ↗
Figure 9
Figure 9. Figure 9: Approximation quality at fixed budget B = 1024 across increasing numbers of covariates. We next vary the coalition budget for synthetic problems across 25, 50, 75, and 100 covariates, each containing 40% true confounders. Figures 10 and 11 show that increasing the budget improves both attribution mass and confounder recovery. This confirms the expected accuracy-compute tradeoff: larger budgets provide more… view at source ↗
Figure 10
Figure 10. Figure 10: Absolute Shapley mass assigned to true confounders as a function of coalition budget. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Fraction of true confounders recovered among the top 40% of covariates as a function of [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Approximation quality for 200 covariates across increasing budgets. [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: CATE explanation versus confounding bias attribution on a synthetic dataset with 4 [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: PEHE after removing top-ranked, random, or lowest-ranked covariates on the ACIC 2016 [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Global Shapley values for synthetic datasets with 11 covariates: exact versus approximate [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Local Shapley values for the 11-covariate synthetic dataset with exact computation. [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Global Shapley values for the 17-covariate synthetic dataset with RegressionMSR ap [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Stability of approximate ConfoundingSHAP values over 10 runs in the 11-covariate synthetic setting. F.5 Stability over repeated synthetic datasets We additionally test stability across independently generated synthetic datasets. This evaluates whether the method recovers the intended covariate roles across different samples from the same data-generating process, rather than relying on a favorable random d… view at source ↗
Figure 19
Figure 19. Figure 19: Dataset-level stability for the four-covariate synthetic setting with exact computation. [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Dataset-level stability for the 11-covariate synthetic setting with exact computation. [PITH_FULL_IMAGE:figures/full_fig_p028_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Dataset-level stability for the 11-covariate synthetic setting with RegressionMSR approx [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗
read the original abstract

In causal inference, confounders are variables that influence both treatment decisions and outcomes. However, unlike as in randomized clinical trials, the treatment assignment mechanism in observational studies is not known, and it is thus unclear which covariates act as confounders. Here, we aim to generate insight for causal inference and answer: which of the observed covariates act as confounders? We introduce ConfoundingSHAP, a Shapley-based method for attributing confounding strength to individual covariates. Our contributions are twofold. First, we propose a Shapley game targeted to infer the confounding strength of the covariates. Our resulting Shapley values differ from the standard applications of SHAP explanations on causal targets, such as understanding treatment effect heterogeneity, which are ill-suited for our task. Second, as our task requires evaluating the value function over many adjustment sets, we provide a scalable TabPFN-based estimation that avoids exhaustive refitting. We demonstrate the practical value across various datasets, where ConfoundingSHAP provides informative explanations of which observed covariates drive confounding and thereby helps to provide more insight for causal inference in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces ConfoundingSHAP, a Shapley-value method that attributes confounding strength to individual covariates in observational causal inference. It defines a game whose value function is evaluated over adjustment sets (distinct from standard SHAP applications to treatment-effect heterogeneity), and substitutes TabPFN for exhaustive refitting to achieve scalability. The authors claim that the resulting attributions identify which observed covariates drive confounding and thereby supply practical insight for causal analysis, with demonstrations on various datasets.

Significance. If the value function successfully isolates confounding strength rather than general dependence, the approach would supply a useful diagnostic for high-dimensional observational studies, helping analysts decide which covariates to adjust for and interpret the sources of bias. The TabPFN-based estimator is a pragmatic engineering contribution that removes the computational barrier to evaluating the exponential number of adjustment sets.

major comments (3)
  1. [demonstration / experiments] The empirical evaluation (demonstration section) reports no quantitative metrics, error bars, or comparisons against ground-truth confounding strengths or alternative methods. Without these, the claim that ConfoundingSHAP 'provides informative explanations' cannot be assessed and the central practical contribution remains unsupported.
  2. [§3 (Shapley game definition)] The value function of the proposed Shapley game is defined over adjustment sets but contains no explicit mechanism (contrast, bias term, or causal contrast) that subtracts direct predictive effects, mediation, or collider bias. Consequently, the attributions may reflect any form of statistical dependence rather than confounding strength alone; this is load-bearing for the method's validity.
  3. [§2 / §3] The manuscript asserts that the resulting Shapley values 'differ from the standard applications of SHAP explanations on causal targets' yet provides no formal comparison or counter-example showing that standard SHAP on the outcome or CATE would fail to recover the same ranking of confounders.
minor comments (2)
  1. [§3] Notation for the value function v(S) and the adjustment-set game should be introduced with a single, self-contained definition rather than scattered across paragraphs.
  2. [experiments] The abstract states that the method was 'demonstrated on various datasets' but the main text should include a table summarizing dataset characteristics, number of covariates, and sample sizes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of ConfoundingSHAP's potential utility. We address each major comment below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: The empirical evaluation (demonstration section) reports no quantitative metrics, error bars, or comparisons against ground-truth confounding strengths or alternative methods. Without these, the claim that ConfoundingSHAP 'provides informative explanations' cannot be assessed and the central practical contribution remains unsupported.

    Authors: We agree that the demonstration section would be strengthened by quantitative support. The current version uses qualitative illustrations on real datasets to highlight interpretability where ground truth is unavailable. In revision we will add a new subsection with synthetic experiments that have known confounding structures, reporting metrics such as ranking correlation with true confounders and precision at identifying them. We will also compare against baselines (e.g., outcome-model feature importance) and include error bars from repeated simulations. revision: yes

  2. Referee: The value function of the proposed Shapley game is defined over adjustment sets but contains no explicit mechanism (contrast, bias term, or causal contrast) that subtracts direct predictive effects, mediation, or collider bias. Consequently, the attributions may reflect any form of statistical dependence rather than confounding strength alone; this is load-bearing for the method's validity.

    Authors: The value function is explicitly the absolute change in the estimated average treatment effect when the covariate is added to the adjustment set: v(S) = |τ̂ − τ̂_S|, where τ̂ is the unadjusted estimate and τ̂_S is the estimate after adjusting for S. This focuses on impact to the causal quantity rather than predictive power. We will revise §3 to state this definition more prominently and add a limitations paragraph discussing mediation and collider effects under the maintained no-unmeasured-confounding assumption for the observed covariates. revision: yes

  3. Referee: The manuscript asserts that the resulting Shapley values 'differ from the standard applications of SHAP explanations on causal targets' yet provides no formal comparison or counter-example showing that standard SHAP on the outcome or CATE would fail to recover the same ranking of confounders.

    Authors: We will add a new comparison subsection (and appendix counter-example) showing that standard SHAP applied to an outcome model or CATE model ranks variables by predictive strength, including non-confounders such as pure outcome predictors. The counter-example uses a simple DAG in which a non-confounder receives high standard SHAP but low ConfoundingSHAP; we will also report empirical ranking differences on one of the real datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity in ConfoundingSHAP derivation

full rationale

The paper defines a Shapley game over adjustment sets whose value function measures changes in causal effect estimates, then computes attributions via the standard Shapley formula and approximates the many evaluations with TabPFN. This construction does not reduce any claimed prediction to a fitted parameter by definition, nor does it rely on a self-citation chain or imported uniqueness theorem to justify the central result. The value function is an explicit modeling choice whose correctness is an external assumption rather than a tautology, and the method is presented as a novel targeting of Shapley values rather than a renaming of a known pattern. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method assumes that a well-defined value function over adjustment sets can isolate confounding strength and that TabPFN provides an accurate enough surrogate for the many required evaluations.

axioms (2)
  • domain assumption A value function over adjustment sets exists that isolates confounding strength
    Stated in the abstract as the target of the Shapley game.
  • domain assumption TabPFN can approximate the required value-function evaluations without refitting
    Presented as the scalability solution.

pith-pipeline@v0.9.0 · 5519 in / 1145 out tokens · 35245 ms · 2026-05-12T03:19:48.711583+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 1 internal anchor

  1. [1]

    C.BénardandJ.Josse.Variableimportanceforcausalforests: Breakingdowntheheterogeneity of treatment effects.Journal of Causal Inference, 13(1), 2025

  2. [2]

    Learningoptimaldynamictreatmentregimesusing causal tree methods in medicine

    T.Blümlein,J.Persson,andS.Feuerriegel. Learningoptimaldynamictreatmentregimesusing causal tree methods in medicine. InMachine Learning for Healthcare Conference, 2022

  3. [3]

    Butler, A

    L. Butler, A. Agarwal, J. S. Kang, Y. E. Erginbas, B. Yu, and K. Ramchandran. ProxySPEX: Inference-efficient interpretability via sparse feature interactions in LLMs. InConference on Neural Information Processing Systems (NeurIPS), 2025

  4. [4]

    Castro, D

    J. Castro, D. Gómez, and J. Tejada. Polynomial calculation of the Shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009

  5. [5]

    S. L. Chau, R. Hu, J. Gonzalez, and D. Sejdinovic. RKHS-SHAP: Shapley values for kernel methods. InConference on Neural Information Processing Systems (NeurIPS), 2022

  6. [6]

    arXiv preprint arXiv:2002.11631 , year=

    H.Chen,T.Harinen,J.-Y.Lee,MikeYung,andZ.Zhao. CausalML:Pythonpackageforcausal machine learning.arXiv preprint, arXiv:2002.11631, 2020

  7. [7]

    Racialandethnicdisparities inhealthcareaccessandutilizationundertheaffordablecareact.MedicalCare,54(2):140–146, 2016

    J.Chen, A.Vargas-Bustamante, K.Mortensen, andA.N.Ortega. Racialandethnicdisparities inhealthcareaccessandutilizationundertheaffordablecareact.MedicalCare,54(2):140–146, 2016

  8. [8]

    Chen and C

    T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. InInternational Confer- ence on Knowledge Discovery and Data Mining (KDD), 2016

  9. [9]

    Makingsenseofsensitivity: Extendingomittedvariablebias.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):39–67, 2020

    C.CinelliandC.Hazlett. Makingsenseofsensitivity: Extendingomittedvariablebias.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):39–67, 2020

  10. [10]

    TheeffectivenessofrightheartcatheterizationintheinitialcareofcriticallyIII patients.JAMA: The Journal of the American Medical Association, 276(11):889, 1996

    A.F.Connors. TheeffectivenessofrightheartcatheterizationintheinitialcareofcriticallyIII patients.JAMA: The Journal of the American Medical Association, 276(11):889, 1996

  11. [11]

    Covert and S.-I

    I. Covert and S.-I. Lee. Improving KernelSHAP: Practical Shapley value estimation via linear regression. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021

  12. [12]

    Understandingglobalfeaturecontributionswithadditive importance measures

    I.Covert,S.Lundberg,andS.-I.Lee. Understandingglobalfeaturecontributionswithadditive importance measures. InConference on Neural Information Processing Systems (NeurIPS), 2020

  13. [13]

    Nonparametricestimationofheterogeneoustreatmenteffects: From theory to learning algorithms

    A.CurthandM.vanderSchaar. Nonparametricestimationofheterogeneoustreatmenteffects: From theory to learning algorithms. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021. 10

  14. [14]

    Automatedversusdo-it-yourselfmethods for causal inference: Lessons learned from a data analysis competition.Statistical Science, 34 (1), 2019

    V.Dorie,J.Hill,U.Shalit,M.Scott,andD.Cervone. Automatedversusdo-it-yourselfmethods for causal inference: Lessons learned from a data analysis competition.Statistical Science, 34 (1), 2019

  15. [15]

    J. Dorn, K. Guo, and N. Kallus. Doubly-valid/Doubly-sharp sensitivity analysis for causal inference with unmeasured confounding.Journal of the American Statistical Association, 120 (549):331–342, 2025

  16. [16]

    Feuerriegel, D

    S. Feuerriegel, D. Frauen, V. Melnychuk, J. Schweisthal, K. Hess, A. Curth, S. Bauer, N. Kil- bertus, I. S. Kohane, and M. van der Schaar. Causal machine learning for predicting treatment outcomes.Nature Medicine, 30(4):958–968, 2024

  17. [17]

    Fonseca and J

    J. Fonseca and J. Stoyanovich. ExplainerPFN: Towards tabular foundation models for model- free zero-shot feature importance estimations.arXiv preprint, arXiv:2601.23068, 2026

  18. [18]

    Frauen, V

    D. Frauen, V. Melnychuk, and S. Feuerriegel. Sharp bounds for generalized causal sensitivity analysis. InConference on Neural Information Processing Systems (NeurIPS), 2023

  19. [19]

    Frauen, F

    D. Frauen, F. Imrie, A. Curth, V. Melnychuk, S. Feuerriegel, and M. van der Schaar. A neural frameworkforgeneralizedcausalsensitivityanalysis. InInternationalConferenceonLearning Representations (ICLR), 2024

  20. [20]

    C. Frye, C. Rowat, and I. Feige. Asymmetric Shapley values: Incorporating causal knowledge into model-agnostic explainability. InConference on Neural Information Processing Systems (NeurIPS), 2020

  21. [21]

    Fumagalli, M

    F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier, and B. Hammer. SHAP-IQ: Uni- fied approximation of any-order Shapley interactions. InConference on Neural Information Processing Systems (NeurIPS), 2023

  22. [22]

    Fumagalli, M

    F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier, and B. Hammer. KernelSHAP-IQ: Weighted least-square optimization for Shapley interactions. InInternational Conference on Machine Learning (ICML), 2024

  23. [23]

    Fumagalli, L

    F. Fumagalli, L. Butler, J. S. Kang, K. Ramchandran, and R. T. Witter. An odd estima- tor for Shapley values. InInternational Conference on Machine Learning (ICML), volume arXiv:2602.01399, 2026

  24. [24]

    PolySHAP:ExtendingKernelSHAPwithinteraction- informed polynomial regression

    F.Fumagalli,R.T.Witter,andC.Musco. PolySHAP:ExtendingKernelSHAPwithinteraction- informed polynomial regression. InInternational Conference on Learning Representations (ICLR), 2026

  25. [25]

    Grinsztajn, K

    L. Grinsztajn, K. Flöge, O. Key, F. Birkel, P. Jund, B. Roof, B. Jäger, D. Safaric, S. Alessi, A. Hayler, M. Manium, R. Yu, F. Jablonski, S. B. Hoo, A. Garg, J. Robertson, M. Bühler, V. Moroshan, L. Purucker, C. Cornu, L. C. Wehrhahn, A. Bonetto, B. Schölkopf, S. Gambhir, N. Hollmann, and F. Hutter. TabPFN-2.5: Advancing the state of the art in tabular fo...

  26. [26]

    From observational data to clinical recommendations: A causal framework for estimating patient-level treatment effects and learning policies.arXiv preprint, arXiv:2507.11381, 2025

    R.Gutman,S.Sheiba,O.N.Klein,N.D.Bird,A.Gruber,D.Aronson,O.Caspi,andU.Shalit. From observational data to clinical recommendations: A causal framework for estimating patient-level treatment effects and learning policies.arXiv preprint, arXiv:2507.11381, 2025

  27. [27]

    S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich,W.K.Henry,M.M.Lederman,J.P.Phair,M.Niu,M.S.Hirsch,andT.C.Merigan. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults withCD4cellcountsfrom200to500percubicmillimeter.NewEnglandJournalofMedicine, 335(15):1081–1090, 1996

  28. [28]

    In- terpretationofepidemiologicstudiesveryoftenlackedadequateconsiderationofconfounding

    L.G.Hemkens,H.Ewald,F.Naudet,A.Ladanie,J.G.Shaw,G.Sajeev,andJ.P.Ioannidis. In- terpretationofepidemiologicstudiesveryoftenlackedadequateconsiderationofconfounding. Journal of Clinical Epidemiology, 93:94–102, 2018

  29. [29]

    M. A. Hernán and J. M. Robins. Using big data to emulate a target trial when a randomized trial is not available.American Journal of Epidemiology, 183(8):758–764, 2016

  30. [30]

    Heskes, E

    T. Heskes, E. Sijben, I. G. Bucur, and T. Claassen. Causal Shapley values: Exploiting causal knowledge to explain individual predictions of complex models. InConference on Neural Information Processing Systems (NeurIPS), 2020. 11

  31. [31]

    EfficientandSharpOff-PolicyLearning under Unobserved Confounding

    K.Hess,D.Frauen,V.Melnychuk,andS.Feuerriegel. EfficientandSharpOff-PolicyLearning under Unobserved Confounding. InInternational Conference on Learning Representations (ICLR), 2026

  32. [32]

    O. J. Hines, K. Diaz-Ordaz, and S. Vansteelandt. Variable importance measures for heteroge- neous treatment effects.Biometrics, 81(4):ujaf140, 2025

  33. [33]

    Hollmann, S

    N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. InInternational Conference on Learning Representations (ICLR), 2023

  34. [34]

    Accuratepredictionsonsmalldatawithatabularfoundationmodel.Nature, 637(8045):319–326, Jan

    N.Hollmann,S.Müller,L.Purucker,A.Krishnakumar,M.Körfer,S.B.Hoo,R.T.Schirrmeis- ter, andF.Hutter. Accuratepredictionsonsmalldatawithatabularfoundationmodel.Nature, 637(8045):319–326, Jan. 2025

  35. [35]

    C. Hu, L. Li, W. Huang, T. Wu, Q. Xu, J. Liu, and B. Hu. Interpretable machine learning for early prediction of prognosis in sepsis: A discovery and validation study.Infectious Diseases and Therapy, 11(3):1117–1132, 2022

  36. [36]

    Hu, M.-Y

    S.-H. Hu, M.-Y. Huang, C.-Y. Chen, and H.-M. Hsieh. Treatment patterns of targeted and nontargeted therapies and survival effects in patients with locally advanced head and neck cancer in Taiwan.BMC Cancer, 23(1):567, June 2023

  37. [37]

    Cambridge University Press, 1 edition, Apr

    G.W.ImbensandD.B.Rubin.CausalInferenceforStatistics,Social,andBiomedicalSciences: An Introduction. Cambridge University Press, 1 edition, Apr. 2015

  38. [38]

    E. H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2), 2023

  39. [39]

    Kolpaczki and E

    P. Kolpaczki and E. Hüllermeier. Approximation algorithms for the Shapley value: Taxonomy and properties. InInternational Joint Conference on Artificial Intelligence (IJCAI), 2026

  40. [40]

    Kolpaczki, V

    P. Kolpaczki, V. Bengs, M. Muschalik, and E. Hüllermeier. Approximating the Shapley value without marginal contributions. InConference on Artificial Intelligence (AAAI), 2024

  41. [41]

    Kolpaczki, M

    P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer, and E. Hüllermeier. SVARM-IQ: Efficientapproximationofany-orderShapleyinteractionsthroughstratification.InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  42. [42]

    Explainingindividualizedtreatmentrules: IntegratingLIMEandSHAP with xgboost in precision medicine.Statistics in Medicine, 44(28-30):e70322, 2025

    Z.LiuandX.Huang. Explainingindividualizedtreatmentrules: IntegratingLIMEandSHAP with xgboost in precision medicine.Statistics in Medicine, 44(28-30):e70322, 2025

  43. [43]

    A. R. Ludtke, I. Diaz, and M. J. van der Laan. The statistics of sensitivity analyses.U.C. Berkeley Division of Biostatistics Working Paper Series, page Working Paper 341, 2015

  44. [44]

    Aunifiedapproachtointerpretingmodelpredictions

    S.LundbergandS.-I.Lee. Aunifiedapproachtointerpretingmodelpredictions. InConference on Neural Information Processing Systems (NeurIPS), 2017

  45. [45]

    Bansal, and S.-I

    S.M.Lundberg,G.Erion,H.Chen,A.DeGrave,J.M.Prutkin,B.Nair,R.Katz,J.Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable AI for trees.Nature Machine Intelligence, 2(1):56–67, 2020

  46. [46]

    Explanations

    D. Martens, G. Shmueli, T. Evgeniou, K. Bauer, C. Janiesch, S. Feuerriegel, S. Gabel, S. Goethals, T. Greene, N. Klein, M. Kraus, N. Kühl, C. Perlich, W. Verbeke, A. Zharova, P. Zschech, and F. Provost. Beware of "Explanations" of AI.Business \& Information Systems Engineering (BISE), 2026

  47. [47]

    M. L. Martini, S. N. Neifert, W. H. Shuman, E. K. Chapman, A. J. Schüpper, E. K. Oermann, J. Mocco, M. Todd, J. C. Torner, A. Molyneux, S. Mayer, P. L. Roux, M. D. I. Vergouwen, G. J. E. Rinkel, G. K. C. Wong, P. Kirkpatrick, A. Quinn, D. Hänggi, N. Etminan, W. M. Van DenBergh,B.N.R.Jaja,M.Cusimano,T.A.Schweizer,J.I.Suarez,H.Fukuda,S.Yamagata, B.Lo,A.Leon...

  48. [48]

    Mcclean, Z

    A. Mcclean, Z. Branson, and E. H. Kennedy. Calibrated sensitivity models.Biometrika, page asag001, 2026

  49. [49]

    Melnychuk, D

    V. Melnychuk, D. Frauen, and S. Feuerriegel. Partial Counterfactual Identification of Con- tinuous Outcomes with a Curvature Sensitivity Model. InConference on Neural Information Processing Systems (NeurIPS), 2023. 12

  50. [50]

    Inference on Variable Importance for Treatment Effect Heterogeneity: Shapley Values and Beyond

    P.Morzywolek,P.B.Gilbert,andA.Luedtke. Inferenceonlocalvariableimportancemeasures for heterogeneous treatment effects.arXiv preprint, arXiv:2510.18843, 2025

  51. [51]

    Muschalik, H

    M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer, and E. Hüllermeier. Shapiq: Shapley interactions for machine learning. InConference on Neural Information Processing Systems (NeurIPS), 2024

  52. [52]

    Muschalik, F

    M. Muschalik, F. Fumagalli, B. Hammer, and E. Hüllermeier. Beyond TreeSHAP: Efficient computation of any-order shapley interactions for tree ensembles. InConference on Artificial Intelligence (AAAI), volume 38, pages 14388–14396, 2024

  53. [53]

    InInternational Conference on Learning Representations (ICLR), 2025

    M.Muschalik,F.Fumagalli,P.Frazzetto,J.Strotherm,L.Hermes,A.Sperduti,E.Hüllermeier, andB.Hammer.Exactcomputationofany-orderShapleyinteractionsforgraphneuralnetworks. InInternational Conference on Learning Representations (ICLR), 2025

  54. [54]

    Musco and R

    C. Musco and R. T. Witter. Provably accurate Shapley value estimation via leverage score sampling. InInternational Conference on Learning Representations (ICLR), 2025

  55. [55]

    Nadel and R

    A. Nadel and R. Wettenstein. From decision trees to boolean logic: A fast and unified SHAP algorithm. InConference on Artificial Intelligence (AAAI), volume 40, pages 24476–24485, 2026

  56. [56]

    Measuringvariable importance in heterogeneous treatment effects with confidence

    J.Paillard,A.R.Lobo,V.Kolodyazhniy,B.Thirion,andD.A.Engemann. Measuringvariable importance in heterogeneous treatment effects with confidence. InInternational Conference on Machine Learning (ICML), 2025

  57. [57]

    Parafita, T

    Á. Parafita, T. Garriga, A. Brando, and F. J. Cazorla. Practical Do-Shapley explanations with estimand-agnostic causal inference. InConference on Neural Information Processing Systems (NeurIPS), 2025

  58. [58]

    Pearl.Causality: Models, Reasoning, and Inference

    J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009

  59. [59]

    Rehill and N

    P. Rehill and N. Biddle. Transparency challenges in policy evaluation with causal machine learning: Improving usability and accountability.Data & Policy, 6:e43, 2024

  60. [60]

    WhyshouldItrustyou?

    M.T.Ribeiro,S.Singh,andC.Guestrin. "WhyshouldItrustyou?": Explainingthepredictions ofanyclassifier.InInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD), 2016

  61. [61]

    J. M. Robins, A. Rotnitzky, and D. O. Scharfstein. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In W. Miller, M. E. Halloran, and D. Berry, editors,Statistical Models in Epidemiology, the Environment, and Clinical Trials, volume 116, pages 1–94. Springer New York, New York, NY, 2000

  62. [62]

    P. R. Rosenbaum and D. B. Rubin. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome.Journal of the Royal Statistical Society Series B: Statistical Methodology, 45(2):212–218, Jan. 1983

  63. [63]

    P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

  64. [64]

    Estimatingcausaleffectsoftreatmentsinrandomizedandnonrandomizedstudies

    D.B.Rubin. Estimatingcausaleffectsoftreatmentsinrandomizedandnonrandomizedstudies. Journal of Educational Psychology, 66(5):688–701, 1974

  65. [65]

    Causalinferenceusingpotentialoutcomes: Design,modeling,decisions.Journal of the American Statistical Association, 100(469):322–331, 2005

    D.B.Rubin. Causalinferenceusingpotentialoutcomes: Design,modeling,decisions.Journal of the American Statistical Association, 100(469):322–331, 2005

  66. [66]

    Schrod, A

    S. Schrod, A. Schäfer, S. Solbrig, R. Lohmayer, W. Gronwald, P. J. Oefner, T. Beißbarth, R. Spang, H. U. Zacharias, and M. Altenbuchinger. BITES: Balanced individual treatment effect for survival data.Bioinformatics, 38(Supplement_1):i60–i67, June 2022

  67. [67]

    Distin- guishing prognostic and predictive biomarkers: An information theoretic approach.Bioinfor- matics, 34(19):3365–3376, 2018

    K.Sechidis, K.Papangelou, P.D.Metcalfe, D.Svensson, J.Weatherall, andG.Brown. Distin- guishing prognostic and predictive biomarkers: An information theoretic approach.Bioinfor- matics, 34(19):3365–3376, 2018

  68. [68]

    Hemmings, S

    K.Sechidis,S.Sun,Y.Chen,J.Lu,C.Zhang,M.Baillie,D.Ohlssen,M.Vandemeulebroecke, R. Hemmings, S. Ruberg, and B. Bornkamp. WATCH: A workflow to assess treatment effect heterogeneityindrugdevelopmentforclinicaltrialsponsors.PharmaceuticalStatistics,24(2): e2463, 2025. 13

  69. [69]

    Sechidis, C

    K. Sechidis, C. Zhang, S. Sun, Y. Chen, A. Spector, and B. Bornkamp. Using individualized treatment effects to assess treatment effect heterogeneity.Statistics in Medicine, 44(28-30): e70324, 2025

  70. [70]

    L. S. Shapley. A value for n-person games. In H. W. Kuhn and A. W. Tucker, editors, ContributionstotheTheoryofGames(AM-28),VolumeII,pages307–318.PrincetonUniversity Press, 1953

  71. [71]

    Y. Shi, Y. Zou, J. Liu, Y. Wang, Y. Chen, F. Sun, Z. Yang, G. Cui, X. Zhu, X. Cui, and F.Liu. Ultrasound-basedradiomicsXGBoostmodeltoassesstheriskofcentralcervicallymph node metastasis in patients with papillary thyroid carcinoma: Individual application of SHAP. Frontiers in Oncology, 12:897596, 2022

  72. [72]

    Shimoni, E

    Y. Shimoni, E. Karavani, S. Ravid, P. Bak, T. H. Ng, S. H. Alford, D. Meade, and Y. Gold- schmidt.Anevaluationtoolkittoguidemodelselectionandcohortdefinitionincausalinference, June 2019

  73. [73]

    Štrumbelj and I

    E. Štrumbelj and I. Kononenko. Explaining prediction models and individual predictions with feature contributions.Knowledge and Information Systems, 41(3):647–665, 2014

  74. [74]

    Q. Sun, K. Zhang, Y. Xu, M. Luo, Z. Yang, Q. Liu, S. Liu, and A. Liu. Explainable machine learningforpredictingclinicaloutcomesinHIV/TBco-infection: Acomparativeretrospective study.BMC Infectious Diseases, 25(1):1589, 2025

  75. [75]

    Svensson, E

    D. Svensson, E. Hermansson, N. Nikolaou, K. Sechidis, and I. Lipkovich. Overview and practical recommendations on using Shapley values for identifying predictive biomarkers via CATE modeling.arXiv preprint, arXiv:2505.01145, 2025

  76. [76]

    Z. Tan. A distributional approach for causal inference using propensity scores.Journal of the American Statistical Association, 101(476):1619–1637, 2006

  77. [77]

    Explain- able SHAP-XGBoost models for in-hospital mortality after myocardial infarction.Cardiovas- cular Digital Health Journal, 4(4):126–132, 2023

    C.Tarabanis,E.Kalampokis,M.Khalil,C.L.Alviar,L.A.Chinitz,andL.Jankelson. Explain- able SHAP-XGBoost models for in-hospital mortality after myocardial infarction.Cardiovas- cular Digital Health Journal, 4(4):126–132, 2023

  78. [78]

    Robustcausal inference using directed acyclic graphs: The R package ‘dagitty’.International Journal of Epidemiology, page dyw341, 2017

    J.Textor,B.VanDerZander,M.S.Gilthorpe,M.Liśkiewicz,andG.T.Ellison. Robustcausal inference using directed acyclic graphs: The R package ‘dagitty’.International Journal of Epidemiology, page dyw341, 2017

  79. [79]

    M. J. Van Der Laan and D. Rubin. Targeted maximum likelihood learning.The International Journal of Biostatistics, 2(1), 2006

  80. [80]

    T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: Introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017

Showing first 80 references.