The Representational Limit of Scalar Interactions: An Interventional Decomposition

Potito Aghilar; Sabino Roccotelli; Sebastiano Stramaglia; Stanislao Fidanza; Tommaso Di Noia; Vito Walter Anelli

arxiv: 2606.19410 · v1 · pith:3IHHFFPPnew · submitted 2026-06-17 · 📊 stat.ML · cs.LG

The Representational Limit of Scalar Interactions: An Interventional Decomposition

Potito Aghilar , Sabino Roccotelli , Stanislao Fidanza , Vito Walter Anelli , Sebastiano Stramaglia , Tommaso Di Noia This is my paper

Pith reviewed 2026-06-26 18:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords interaction decompositionuniqueness redundancy synergyShapley interactionstructural causal modelmasked inferencepost-hoc interpretabilityXOR model

0 comments

The pith

Scalar pairwise interaction scores mix uniqueness, redundancy, and synergy that cannot be separated from pairs alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that common pairwise interaction measures in machine learning cannot separate three different ways features can interact: unique contributions from one feature, redundant information shared across features, and synergistic effects that only appear when features combine. It demonstrates the mixing on a simple three-variable XOR causal model where some standard indices miss the interaction entirely while others smear the third-order effect across incorrect pair values. To fix this, the authors introduce Stochastic Hi-Fi, a retraining-free method that applies random feature masking and intervention to recover separate per-feature profiles for each of the three mechanisms. The estimator supplies exact interventional meaning, finite-sample bounds, and variance reduction, and it recovers the true structure on synthetic causal models while also separating interaction types inside GPT-2 and improving deletion metrics on chest X-ray classification.

Core claim

What carries the argument

Stochastic Hi-Fi, a post-hoc predictability decomposition that uses interventional masked inference to produce separate per-feature uniqueness, redundancy, and synergy profiles.

If this is right

Recovers structure missed by scalar baselines with up to 411 times larger interaction-magnitude recovery ratios on tabular structural causal models.
Separates redundant and synergistic heads inside the GPT-2 indirect-object-identification circuit.
Matches GradCAM performance on the Pointing Game while improving Deletion AUC on the NIH ChestX-ray14 dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same masking approach could be inserted into other post-hoc explanation pipelines to test whether their reported pairwise scores are actually conflating the three mechanisms.
In domains where synergy is expected, such as multi-modal fusion, the decomposition supplies a concrete way to quantify when joint effects exceed what any single feature supplies.
The finite-sample bounds open the possibility of statistical hypothesis tests that decide whether a detected interaction is unique, redundant, or synergistic at a chosen confidence level.

Load-bearing premise

Interventional masked inference can isolate uniqueness, redundancy, and synergy profiles in a manner faithful to the underlying data-generating process without requiring model-specific assumptions beyond the post-hoc predictability decomposition.

What would settle it

On the known 3-way XOR structural causal model, compute the true U/R/S contributions of each variable and check whether Stochastic Hi-Fi estimates deviate from those values by more than the stated Monte Carlo error bounds.

Figures

Figures reproduced from arXiv: 2606.19410 by Potito Aghilar, Sabino Roccotelli, Sebastiano Stramaglia, Stanislao Fidanza, Tommaso Di Noia, Vito Walter Anelli.

**Figure 1.** Figure 1: Left: Pair-level scalar indices on the 3-way XOR SCM: faithful indices collapse higherorder signal to near-zero pair scores, while projective indices redistribute the triplet effect into small signed pair coefficients. Right: Per-feature U/R/S profile from Stochastic Hi-Fi (mean ± std over 5 seeds): the three active features are synergy-dominated, while the spectator is separated. Hi-Fi recovers both role… view at source ↗

**Figure 2.** Figure 2: Qualitative ChestX-ray comparison on a Cardiomegaly case: original image with ground-truth box (white), Stochastic HiFi, GradCAM, Integrated Gradients, Vanilla Gradient, and Random. This panel is descriptive and complements the quantitative localization/deletion metrics. rather than uniformly positive: Pointing Game is non-significant (paired Wilcoxon on n = 220 imagepairs after tie-removal; 660 ties … view at source ↗

**Figure 3.** Figure 3: IOI attention-head decomposition on the 10-head circuit. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Full 26-head IOI extension. Left: Pair interaction map Sij − Rij over 26 2 = 325 pairs (lower triangle), showing mixed synergistic and redundant structure across role families. Right: Per-head synergy S versus singleton LOCO π (activation-patching identity), highlighting heads with low first-order effect but substantial cooperative contribution. Spearman(π, patch) = 1.000 ± 0.000 holds by construction. H… view at source ↗

read the original abstract

Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pairwise scores mix U/R/S on the XOR example and Stochastic Hi-Fi separates them via masking on synthetics, but the real-data claims rest on unverified isolation.

read the letter

The core contribution is a clean proof that faithful pairwise indices like Shapley-Taylor return zero on a minimal 3-way XOR SCM while projective ones like Shapley Interaction fold the third-order term into pair scalars that mix uniqueness, redundancy, and synergy. Stochastic Hi-Fi then estimates per-feature U/R/S profiles with interventional masking, Monte Carlo bounds, and diamond sampling for variance reduction.

The synthetic results look solid: the recovery ratios reach 411x over baselines on tabular SCMs, which directly illustrates the conflation the proof identifies. The estimator is post-hoc and retraining-free, which is a practical plus for existing models.

The applications to GPT-2 IOI heads and ChestX-ray14 are thinner. They report circuit separation and better Deletion AUC than GradCAM, but without ground-truth labels for the true U/R/S structure in those models, the numbers only show difference, not correctness. The stress-test concern about whether masking preserves exact interventional distributions and avoids leakage of higher-order effects is worth checking in the full derivations; the abstract asserts exact semantics but does not show the commutation argument.

Finite-sample bounds are claimed, yet the abstract gives no derivation sketch, so a referee would need to see that step. No free parameters or invented quantities appear in the method description.

This paper is for interpretability researchers who already work with interaction indices and want a decomposition that respects the three mechanisms. It is worth sending to peer review because the minimal-model proof and the estimator itself are new and falsifiable on SCMs; the real-data sections can be revised or caveated without sinking the central claim.

Referee Report

3 major / 2 minor

Summary. The paper claims that signed pairwise interaction scores conflate uniqueness (U), redundancy (R), and synergy (S), as shown by a proof on a minimal 3-way XOR structural causal model where faithful indices (e.g., Shapley-Taylor) return zero per pair while projective indices spread third-order effects. It introduces Stochastic Hi-Fi, a post-hoc retraining-free method that estimates per-feature U/R/S profiles via interventional masked inference, asserting exact interventional semantics, finite-sample Monte Carlo bounds, variance reduction via coupled diamond sampling, and uniform convergence. Empirical results show up to 411x larger interaction-magnitude recovery on tabular SCMs, separation of redundant/synergistic heads in the GPT-2 IOI circuit, and competitive performance with GradCAM on NIH ChestX-ray14 (Pointing Game and Deletion AUC).

Significance. If the decomposition is faithful, the work provides a concrete advance over scalar interaction indices by separating mechanisms that are otherwise mixed, with direct relevance to feature attribution and circuit analysis in ML. Strengths include the minimal-model proof establishing the conflation phenomenon, the explicit Monte Carlo estimator with variance-reduction technique, and the reproducible empirical comparisons on controlled SCMs.

major comments (3)

[§3] §3 (Stochastic Hi-Fi definition and estimator): The central claim that interventional masked inference isolates U/R/S profiles with 'exact interventional semantics' and no model-specific assumptions beyond post-hoc predictability decomposition is load-bearing. No explicit argument is given that the masking operator commutes with the SCM's causal structure or prevents higher-order leakage under finite masking, which directly affects whether the recovered profiles on the 3-way XOR are guaranteed to match the structural mechanisms.
[§4.2] §4.2 (finite-sample bounds): The abstract and method assert finite-sample Monte Carlo bounds and uniform finite-vocabulary convergence, yet the derivation is not shown; this is required to support the variance-reduction and convergence claims that underwrite the empirical recovery ratios.
[§6.3] §6.3 and §7 (GPT-2 IOI and ChestX-ray14 experiments): The separation of redundant/synergistic heads and the medical imaging metrics are presented without ground-truth validation details for the assigned U/R/S labels, weakening the claim that the method recovers structure missed by scalar baselines in real models.

minor comments (2)

[§3.3] Notation for the diamond sampling procedure could be clarified with an explicit pseudocode block to make the variance-reduction step reproducible from the text alone.
[Abstract] The abstract states 'up to 411x larger interaction-magnitude recovery ratios' without specifying the exact baseline and metric in the summary sentence; a parenthetical reference to the relevant table would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. Below we address each of the major comments in detail, indicating the revisions we plan to make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Stochastic Hi-Fi definition and estimator): The central claim that interventional masked inference isolates U/R/S profiles with 'exact interventional semantics' and no model-specific assumptions beyond post-hoc predictability decomposition is load-bearing. No explicit argument is given that the masking operator commutes with the SCM's causal structure or prevents higher-order leakage under finite masking, which directly affects whether the recovered profiles on the 3-way XOR are guaranteed to match the structural mechanisms.

Authors: We agree that an explicit argument for the commutation of the masking operator with the SCM structure would strengthen the central claim. In the revised manuscript, we will add a dedicated subsection in §3 providing a formal argument that the interventional masking isolates U, R, and S without higher-order leakage, leveraging the definition of the predictability decomposition and the finite masking sets used in the estimator. This will directly address the 3-way XOR case. revision: yes
Referee: [§4.2] §4.2 (finite-sample bounds): The abstract and method assert finite-sample Monte Carlo bounds and uniform finite-vocabulary convergence, yet the derivation is not shown; this is required to support the variance-reduction and convergence claims that underwrite the empirical recovery ratios.

Authors: The derivations for the finite-sample Monte Carlo bounds and uniform convergence are provided in the supplementary material. To make this more accessible, we will include a high-level sketch of the proof in §4.2 of the main text, highlighting the role of coupled diamond sampling in variance reduction and the conditions for uniform convergence over finite vocabularies. revision: yes
Referee: [§6.3] §6.3 and §7 (GPT-2 IOI and ChestX-ray14 experiments): The separation of redundant/synergistic heads and the medical imaging metrics are presented without ground-truth validation details for the assigned U/R/S labels, weakening the claim that the method recovers structure missed by scalar baselines in real models.

Authors: For the GPT-2 experiments, the U/R/S assignments are validated by their consistency with the established IOI circuit analysis in the literature, where certain heads are known to exhibit redundant or synergistic behavior based on ablation studies. For the ChestX-ray14, we rely on the standard evaluation protocols using Pointing Game and Deletion AUC, showing competitive or improved performance. We acknowledge that direct ground-truth labels for U/R/S are inherently unavailable in these complex models without full causal specification. We will add a discussion of this limitation and the reliance on comparative and literature-based validation in the revised §6.3 and §7. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on interventional definitions and SCM example, not self-referential reductions

full rationale

The paper defines Stochastic Hi-Fi directly via interventional masked inference and Monte Carlo estimation on a 3-way XOR SCM to separate U/R/S, with properties (exact semantics, variance bounds) following from the sampling procedure itself. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and the central decomposition is not equivalent to its inputs. The method is presented as post-hoc and retraining-free without parameter fitting that would force the reported profiles.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the 3-way XOR SCM for general conflation and on the interventional semantics of masked inference being sufficient to separate the three mechanisms.

axioms (2)

domain assumption The 3-way XOR structural causal model is a faithful minimal example that exposes the conflation in scalar indices.
Invoked as the setting for the proof in the abstract.
domain assumption Interventional masked inference yields exact semantics for U/R/S decomposition.
Stated as a property of the estimator.

pith-pipeline@v0.9.1-grok · 5719 in / 1354 out tokens · 30302 ms · 2026-06-26T18:52:04.556720+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Openxai: Towards a transparent evaluation of model explanations

Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, and Himabindu Lakkaraju. Openxai: Towards a transparent evaluation of model explanations. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conferenc...

2022
[2]

Quantifying unique information.Entropy, 16(4):2161–2183, 2014

Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, Jürgen Jost, and Nihat Ay. Quantifying unique information.Entropy, 16(4):2161–2183, 2014. doi: 10.3390/E16042161. URL https: //doi.org/10.3390/e16042161

work page doi:10.3390/e16042161 2014
[3]

Explaining graph neural net- works via structure-aware interaction index

Ngoc Bui, Hieu Trung Nguyen, Viet Anh Nguyen, and Rex Ying. Explaining graph neural net- works via structure-aware interaction index. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty- first International Conference on Machine Learning, ICML 2024, Vienna, Austria,...

2024
[4]

URLhttps://proceedings.mlr.press/v235/bui24b.html
[5]

Polynomial calculation of the Shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009

Javier Castro, Daniel Gómez, and Juan Tejada. Polynomial calculation of the shapley value based on sampling.Comput. Oper. Res., 36(5):1726–1730, 2009. doi: 10.1016/J.COR.2008.04.004. URLhttps://doi.org/10.1016/j.cor.2008.04.004

work page doi:10.1016/j.cor.2008.04.004 2009
[6]

Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Info...

2023
[7]

Lundberg, and Su-In Lee

Ian Covert, Scott M. Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS ...

2020
[8]

Diffusionpid: interpreting diffusion via partial information decomposition

Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, and Yonatan Bisk. Diffusionpid: interpreting diffusion via partial information decomposition. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

2024
[9]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Confere...

2019
[11]

Kernelshap-iq: Weighted least square optimization for shapley interactions

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, and Barbara Hammer. Kernelshap-iq: Weighted least square optimization for shapley interactions. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine...

2024
[12]

Grabisch, M

Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among players in cooperative games.Int. J. Game Theory, 28(4):547–565, 1999. doi: 10.1007/ S001820050125. URLhttps://doi.org/10.1007/s001820050125

work page doi:10.1007/s001820050125 1999
[13]

Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Woj- ciech Samek, Sebastian Lapuschkin, and Marina M.-C. Höhne. Quantus: An explainable AI toolkit for responsible evaluation of neural network explanations and beyond.J. Mach. Learn. Res., 24:34:1–34:11, 2023. URLhttps://jmlr.org/papers/v24/22-0142.html

2023
[14]

Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models

Tom Heskes, Evi Sijben, Ioan Gabriel Bucur, and Tom Claassen. Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan- Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Con- ference on Neura...

2020
[15]

A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948

Wassily Hoeffding. A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948. ISSN 00034851. URL http://www.jstor. org/stable/2235637

arXiv 1948
[16]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. ISSN 01621459, 1537274X

1963
[17]

Discovering additive structure in black box functions

Giles Hooker. Discovering additive structure in black box functions. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors,Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004, pages 575–580. ACM, 2004. doi: 10.1145/1014052.1014122. URL https://d...

work page doi:10.1145/1014052.1014122 2004
[18]

A benchmark for inter- pretability methods in deep neural networks

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for inter- pretability methods in deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing...

2019
[19]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2261–2269. IEEE Computer Society,

2017
[20]

, year = 2017, month = jul, pages =

doi: 10.1109/CVPR.2017.243. URLhttps://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017
[21]

James, Jeffrey Emenheiser, and James P

Ryan G. James, Jeffrey Emenheiser, and James P. Crutchfield. Unique information and secret key agreement.Entropy, 21(1):12, 2019. doi: 10.3390/E21010012. URL https://doi.org/ 10.3390/e21010012

work page doi:10.3390/e21010012 2019
[22]

Janizek, Pascal Sturmfels, and Su-In Lee

Joseph D. Janizek, Pascal Sturmfels, and Su-In Lee. Explaining explanations: Axiomatic feature interactions for deep networks.J. Mach. Learn. Res., 22:104:1–104:54, 2021. URL https://jmlr.org/papers/v22/20-1223.html

2021
[23]

Feature relevance quantification in explainable AI: A causal problem

Dominik Janzing, Lenon Minorics, and Patrick Blöbaum. Feature relevance quantification in explainable AI: A causal problem. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research, ...

2020
[24]

URLhttp://proceedings.mlr.press/v108/janzing20a.html

PMLR, 2020. URLhttp://proceedings.mlr.press/v108/janzing20a.html

2020
[25]

Zaletel, and Joel E

Chaeyun Ko. STRIDE: subset-free functional decomposition for XAI in tabular settings.CoRR, abs/2509.09070, 2025. doi: 10.48550/ARXIV .2509.09070. URL https://doi.org/10. 48550/arXiv.2509.09070. 12

work page internal anchor Pith review doi:10.48550/arxiv 2025
[26]

A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022

Artemy Kolchinsky. A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022. doi: 10.3390/E24030403. URLhttps://doi.org/10.3390/e24030403

work page doi:10.3390/e24030403 2022
[27]

M., Kucukelbir, A

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Association, 113(523):1094–1111, 2018. doi: 10.1080/01621459.2017.1307116. URL https://doi.org/10.1080/01621459.2017.1307116

work page doi:10.1080/01621459.2017.1307116 2018
[28]

Kautz, and Chenliang Xu

Samuel Lerman, Charles Venuto, Henry A. Kautz, and Chenliang Xu. Explaining local, global, and higher-order interactions in deep learning. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 1204–

2021
[29]

Proceed- ings of the IEEE International Conference on Computer Vision, 99 92–10002 (2021) https://doi.org/10.1109/ICCV48922.2021.00986

IEEE, 2021. doi: 10.1109/ICCV48922.2021.00126. URL https://doi.org/10.1109/ ICCV48922.2021.00126

work page doi:10.1109/iccv48922.2021.00126 2021
[30]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017...

2017
[31]

M., Erion, G., Chen, H., DeGrave, A., Prutkin, J

Scott M. Lundberg, Gabriel G. Erion, Hugh Chen, Alex J. DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable AI for trees.Nat. Mach. Intell., 2(1):56–67, 2020. doi: 10.1038/S42256-019-0138-9. URLhttps://doi.org/10.1038/s42256-019-0138-9

work page doi:10.1038/s42256-019-0138-9 2020
[32]

SHAP meets tensor networks: Provably tractable explanations with parallelism

Reda Marzouk, Shahaf Bassan, and Guy Katz. SHAP meets tensor networks: Provably tractable explanations with parallelism. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=FfccSikDfZ

2026
[33]

H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers

Ayushi Mehrotra, Dipkamal Bhusal, Michael Clifford, and Nidhi Rastogi. H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. URL https://arxiv.org/abs/2604.22045. Accepted

Pith/arXiv arXiv 2026
[34]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novem...

2022
[35]

shapiq: Shapley interactions for machine learning

Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, and Eyke Hüllermeier. shapiq: Shapley interactions for machine learning. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Confere...

2024
[36]

Roger B. Myerson. Graphs and cooperation in games.Math. Oper. Res., 2(3):225–229, 1977. doi: 10.1287/MOOR.2.3.225. URLhttps://doi.org/10.1287/moor.2.3.225

work page doi:10.1287/moor.2.3.225 1977
[37]

Cortes, Daniele Marinazzo, and Sebastiano Stramaglia

Marlis Ontivero-Ortega, Luca Faes, Jesus M. Cortes, Daniele Marinazzo, and Sebastiano Stramaglia. Assessing high-order effects in feature importance via predictability decomposition. Phys. Rev. E, 111:L033301, Mar 2025. doi: 10.1103/PhysRevE.111.L033301. URL https: //link.aps.org/doi/10.1103/PhysRevE.111.L033301

work page doi:10.1103/physreve.111.l033301 2025
[38]

Estimating the unique information of continuous variables

Ari Pakman, Amin Nejatbakhsh, Dar Gilboa, Abdullah Makkeh, Luca Mazzucato, Michael Wibral, and Elad Schneidman. Estimating the unique information of continuous variables. In 13 Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wort- man Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Confer- ...

2021
[39]

RISE: randomized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: randomized input sampling for explanation of black-box models. InBritish Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018, page 151. BMV A Press, 2018. URL http://bmvc2018.org/ contents/papers/1064.pdf

2018
[40]

why should i trust you?

Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi, editors,Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, ...

work page doi:10.1145/2939672.2939778 2016
[41]

Lawrence Zitnick, and Devi Parikh

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InIEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 618–626. IEEE Computer Society, 2017. doi: 10.1109/ICCV .2017

work page doi:10.1109/iccv 2017
[42]

URLhttps://doi.org/10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017
[43]

Math and Comput in Simulation , year =

I.M Sobol’. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates.Mathematics and Computers in Simulation, 55(1):271–280, 2001. ISSN 0378-4754. doi: https://doi.org/10.1016/S0378-4754(00)00270-6. The Second IMACS Seminar on Monte Carlo Methods

work page doi:10.1016/s0378-4754(00)00270-6 2001
[44]

The many shapley values for model explanation

Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9269–9278. PMLR,

2020
[45]

URLhttp://proceedings.mlr.press/v119/sundararajan20b.html
[46]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 2017. URL http://proceedings. mlr.press...

2017
[47]

The shapley taylor interaction index

Mukund Sundararajan, Kedar Dhamdhere, and Ashish Agarwal. The shapley taylor interaction index. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9259–9268. PMLR, 2020. URLhttp://proceedings.mlr.press/v119/sundararajan20a.html

2020
[48]

Faith-shap: The faithful shapley interaction index.J

Che-Ping Tsai, Chih-Kuan Yeh, and Pradeep Ravikumar. Faith-shap: The faithful shapley interaction index.J. Mach. Learn. Res., 24:94:1–94:42, 2023. URL https://jmlr.org/ papers/v24/22-0202.html

2023
[49]

How does this interaction affect me? inter- pretable attribution for feature interactions

Michael Tsang, Sirisha Rambhatla, and Yan Liu. How does this interaction affect me? inter- pretable attribution for feature interactions. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems ...

2020
[50]

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M. Shieber. Investigating gender bias in language models using causal mediation analysis. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual ...

2020
[51]

Interpretability in the wild: a circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Stein- hardt. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum? id=NpsVSN6o4ul

2023
[52]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly- supervised classification and localization of common thorax diseases. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages ...

work page doi:10.1109/cvpr.2017.369 2017
[53]

B. P. Welford. Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3):419–420, 1962. doi: 10.1080/00401706.1962.10490022

work page doi:10.1080/00401706.1962.10490022 1962
[54]

Williams and Randall D

Paul L. Williams and Randall D. Beer. Nonnegative decomposition of multivariate information. CoRR, abs/1004.2515, 2010. URLhttp://arxiv.org/abs/1004.2515

Pith/arXiv arXiv 2010
[55]

Taylor/kernel

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, Lecture Notes in Computer Science, pages 818–833. Springer, 2014. doi: ...

work page doi:10.1007/978-3-319-10590-1_ 2014
[56]

Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S

Spectators contribute nothing, since Y does not depend on them. Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S. (10) Step 2: Möbius coefficients.From (10): •Empty set.v x(∅) = 1 2, som x(∅) = 1 2. •Singletons.For anyi∈[n],v x({i}) = 1 2, som x({i}) = 1 2 − 1 2 = 0. • Pairs.For any {i, j} ⊆[n] , vx({i, j}) =v x({i}) =v x({j}) =v x(∅) = 1 ...
[57]

Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space

The standalone LOCO π(Xi) = 0 (any single triplet feature alone yields no information gain), soR(X i) = 0andS(X i) = 1 2. Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space. Notice that scalar indices project this interaction into ± ...
[58]

meaningless

because they inherently average over contexts rather than isolating the synergistic extremum. The faithful family produces the zero scalar in R per pair; the projective family produces ± 1 4 in R per pair. These scalar reports do not carry the named decomposition into uniqueness, redundancy, and synergy. In the faithful case, the pair-level report erases ...
[59]

Runtime checks:Continuously evaluate adjacency-dominance conditions during deploy- ment, flagging violations in real-time
[60]

what is the best achievable loss using only the features in S?

Fallback policy:In case of A3 violations, revert to a conservative estimator that does not rely on adjacency-dominance. 3.Logging:Record all flagged violations and fallback activations for offline analysis. Section F.3 verifies that the boundary case (XOR with uniform background) violates this and that the synthetic third-order dataset satisfies it with a...
[61]

Selection:Choose pbg based on domain-specific priors, ensuring it reflects the expected data-generating process
[62]

Justification:Provide a rationale for the choice of pbg, supported by empirical or theoretical evidence
[63]

Diagnostics:Evaluate sensitivity to pbg by comparing results across multiple plausible background distributions
[64]

This protocol aims to improve transparency and reproducibility for interventional estimands

Reporting:Explicitly document the chosen pbg and any observed sensitivity in the experi- mental results. This protocol aims to improve transparency and reproducibility for interventional estimands. On E1, we compare uniform-binary and empirical-resampled backgrounds across 5 seeds per dataset. Across XOR3, XOR+AND, and Synth3, pooled absolute drift remain...

[1] [1]

Openxai: Towards a transparent evaluation of model explanations

Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, and Himabindu Lakkaraju. Openxai: Towards a transparent evaluation of model explanations. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conferenc...

2022

[2] [2]

Quantifying unique information.Entropy, 16(4):2161–2183, 2014

Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, Jürgen Jost, and Nihat Ay. Quantifying unique information.Entropy, 16(4):2161–2183, 2014. doi: 10.3390/E16042161. URL https: //doi.org/10.3390/e16042161

work page doi:10.3390/e16042161 2014

[3] [3]

Explaining graph neural net- works via structure-aware interaction index

Ngoc Bui, Hieu Trung Nguyen, Viet Anh Nguyen, and Rex Ying. Explaining graph neural net- works via structure-aware interaction index. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty- first International Conference on Machine Learning, ICML 2024, Vienna, Austria,...

2024

[4] [4]

URLhttps://proceedings.mlr.press/v235/bui24b.html

[5] [5]

Polynomial calculation of the Shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009

Javier Castro, Daniel Gómez, and Juan Tejada. Polynomial calculation of the shapley value based on sampling.Comput. Oper. Res., 36(5):1726–1730, 2009. doi: 10.1016/J.COR.2008.04.004. URLhttps://doi.org/10.1016/j.cor.2008.04.004

work page doi:10.1016/j.cor.2008.04.004 2009

[6] [6]

Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Info...

2023

[7] [7]

Lundberg, and Su-In Lee

Ian Covert, Scott M. Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS ...

2020

[8] [8]

Diffusionpid: interpreting diffusion via partial information decomposition

Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, and Yonatan Bisk. Diffusionpid: interpreting diffusion via partial information decomposition. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

2024

[9] [9]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Confere...

2019

[10] [11]

Kernelshap-iq: Weighted least square optimization for shapley interactions

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, and Barbara Hammer. Kernelshap-iq: Weighted least square optimization for shapley interactions. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine...

2024

[11] [12]

Grabisch, M

Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among players in cooperative games.Int. J. Game Theory, 28(4):547–565, 1999. doi: 10.1007/ S001820050125. URLhttps://doi.org/10.1007/s001820050125

work page doi:10.1007/s001820050125 1999

[12] [13]

Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Woj- ciech Samek, Sebastian Lapuschkin, and Marina M.-C. Höhne. Quantus: An explainable AI toolkit for responsible evaluation of neural network explanations and beyond.J. Mach. Learn. Res., 24:34:1–34:11, 2023. URLhttps://jmlr.org/papers/v24/22-0142.html

2023

[13] [14]

Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models

Tom Heskes, Evi Sijben, Ioan Gabriel Bucur, and Tom Claassen. Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan- Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Con- ference on Neura...

2020

[14] [15]

A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948

Wassily Hoeffding. A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948. ISSN 00034851. URL http://www.jstor. org/stable/2235637

arXiv 1948

[15] [16]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. ISSN 01621459, 1537274X

1963

[16] [17]

Discovering additive structure in black box functions

Giles Hooker. Discovering additive structure in black box functions. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors,Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004, pages 575–580. ACM, 2004. doi: 10.1145/1014052.1014122. URL https://d...

work page doi:10.1145/1014052.1014122 2004

[17] [18]

A benchmark for inter- pretability methods in deep neural networks

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for inter- pretability methods in deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing...

2019

[18] [19]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2261–2269. IEEE Computer Society,

2017

[19] [20]

, year = 2017, month = jul, pages =

doi: 10.1109/CVPR.2017.243. URLhttps://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017

[20] [21]

James, Jeffrey Emenheiser, and James P

Ryan G. James, Jeffrey Emenheiser, and James P. Crutchfield. Unique information and secret key agreement.Entropy, 21(1):12, 2019. doi: 10.3390/E21010012. URL https://doi.org/ 10.3390/e21010012

work page doi:10.3390/e21010012 2019

[21] [22]

Janizek, Pascal Sturmfels, and Su-In Lee

Joseph D. Janizek, Pascal Sturmfels, and Su-In Lee. Explaining explanations: Axiomatic feature interactions for deep networks.J. Mach. Learn. Res., 22:104:1–104:54, 2021. URL https://jmlr.org/papers/v22/20-1223.html

2021

[22] [23]

Feature relevance quantification in explainable AI: A causal problem

Dominik Janzing, Lenon Minorics, and Patrick Blöbaum. Feature relevance quantification in explainable AI: A causal problem. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research, ...

2020

[23] [24]

URLhttp://proceedings.mlr.press/v108/janzing20a.html

PMLR, 2020. URLhttp://proceedings.mlr.press/v108/janzing20a.html

2020

[24] [25]

Zaletel, and Joel E

Chaeyun Ko. STRIDE: subset-free functional decomposition for XAI in tabular settings.CoRR, abs/2509.09070, 2025. doi: 10.48550/ARXIV .2509.09070. URL https://doi.org/10. 48550/arXiv.2509.09070. 12

work page internal anchor Pith review doi:10.48550/arxiv 2025

[25] [26]

A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022

Artemy Kolchinsky. A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022. doi: 10.3390/E24030403. URLhttps://doi.org/10.3390/e24030403

work page doi:10.3390/e24030403 2022

[26] [27]

M., Kucukelbir, A

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Association, 113(523):1094–1111, 2018. doi: 10.1080/01621459.2017.1307116. URL https://doi.org/10.1080/01621459.2017.1307116

work page doi:10.1080/01621459.2017.1307116 2018

[27] [28]

Kautz, and Chenliang Xu

Samuel Lerman, Charles Venuto, Henry A. Kautz, and Chenliang Xu. Explaining local, global, and higher-order interactions in deep learning. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 1204–

2021

[28] [29]

Proceed- ings of the IEEE International Conference on Computer Vision, 99 92–10002 (2021) https://doi.org/10.1109/ICCV48922.2021.00986

IEEE, 2021. doi: 10.1109/ICCV48922.2021.00126. URL https://doi.org/10.1109/ ICCV48922.2021.00126

work page doi:10.1109/iccv48922.2021.00126 2021

[29] [30]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017...

2017

[30] [31]

M., Erion, G., Chen, H., DeGrave, A., Prutkin, J

Scott M. Lundberg, Gabriel G. Erion, Hugh Chen, Alex J. DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable AI for trees.Nat. Mach. Intell., 2(1):56–67, 2020. doi: 10.1038/S42256-019-0138-9. URLhttps://doi.org/10.1038/s42256-019-0138-9

work page doi:10.1038/s42256-019-0138-9 2020

[31] [32]

SHAP meets tensor networks: Provably tractable explanations with parallelism

Reda Marzouk, Shahaf Bassan, and Guy Katz. SHAP meets tensor networks: Provably tractable explanations with parallelism. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=FfccSikDfZ

2026

[32] [33]

H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers

Ayushi Mehrotra, Dipkamal Bhusal, Michael Clifford, and Nidhi Rastogi. H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. URL https://arxiv.org/abs/2604.22045. Accepted

Pith/arXiv arXiv 2026

[33] [34]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novem...

2022

[34] [35]

shapiq: Shapley interactions for machine learning

Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, and Eyke Hüllermeier. shapiq: Shapley interactions for machine learning. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Confere...

2024

[35] [36]

Roger B. Myerson. Graphs and cooperation in games.Math. Oper. Res., 2(3):225–229, 1977. doi: 10.1287/MOOR.2.3.225. URLhttps://doi.org/10.1287/moor.2.3.225

work page doi:10.1287/moor.2.3.225 1977

[36] [37]

Cortes, Daniele Marinazzo, and Sebastiano Stramaglia

Marlis Ontivero-Ortega, Luca Faes, Jesus M. Cortes, Daniele Marinazzo, and Sebastiano Stramaglia. Assessing high-order effects in feature importance via predictability decomposition. Phys. Rev. E, 111:L033301, Mar 2025. doi: 10.1103/PhysRevE.111.L033301. URL https: //link.aps.org/doi/10.1103/PhysRevE.111.L033301

work page doi:10.1103/physreve.111.l033301 2025

[37] [38]

Estimating the unique information of continuous variables

Ari Pakman, Amin Nejatbakhsh, Dar Gilboa, Abdullah Makkeh, Luca Mazzucato, Michael Wibral, and Elad Schneidman. Estimating the unique information of continuous variables. In 13 Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wort- man Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Confer- ...

2021

[38] [39]

RISE: randomized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: randomized input sampling for explanation of black-box models. InBritish Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018, page 151. BMV A Press, 2018. URL http://bmvc2018.org/ contents/papers/1064.pdf

2018

[39] [40]

why should i trust you?

Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi, editors,Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, ...

work page doi:10.1145/2939672.2939778 2016

[40] [41]

Lawrence Zitnick, and Devi Parikh

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InIEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 618–626. IEEE Computer Society, 2017. doi: 10.1109/ICCV .2017

work page doi:10.1109/iccv 2017

[41] [42]

URLhttps://doi.org/10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017

[42] [43]

Math and Comput in Simulation , year =

I.M Sobol’. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates.Mathematics and Computers in Simulation, 55(1):271–280, 2001. ISSN 0378-4754. doi: https://doi.org/10.1016/S0378-4754(00)00270-6. The Second IMACS Seminar on Monte Carlo Methods

work page doi:10.1016/s0378-4754(00)00270-6 2001

[43] [44]

The many shapley values for model explanation

Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9269–9278. PMLR,

2020

[44] [45]

URLhttp://proceedings.mlr.press/v119/sundararajan20b.html

[45] [46]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 2017. URL http://proceedings. mlr.press...

2017

[46] [47]

The shapley taylor interaction index

Mukund Sundararajan, Kedar Dhamdhere, and Ashish Agarwal. The shapley taylor interaction index. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9259–9268. PMLR, 2020. URLhttp://proceedings.mlr.press/v119/sundararajan20a.html

2020

[47] [48]

Faith-shap: The faithful shapley interaction index.J

Che-Ping Tsai, Chih-Kuan Yeh, and Pradeep Ravikumar. Faith-shap: The faithful shapley interaction index.J. Mach. Learn. Res., 24:94:1–94:42, 2023. URL https://jmlr.org/ papers/v24/22-0202.html

2023

[48] [49]

How does this interaction affect me? inter- pretable attribution for feature interactions

Michael Tsang, Sirisha Rambhatla, and Yan Liu. How does this interaction affect me? inter- pretable attribution for feature interactions. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems ...

2020

[49] [50]

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M. Shieber. Investigating gender bias in language models using causal mediation analysis. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual ...

2020

[50] [51]

Interpretability in the wild: a circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Stein- hardt. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum? id=NpsVSN6o4ul

2023

[51] [52]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly- supervised classification and localization of common thorax diseases. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages ...

work page doi:10.1109/cvpr.2017.369 2017

[52] [53]

B. P. Welford. Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3):419–420, 1962. doi: 10.1080/00401706.1962.10490022

work page doi:10.1080/00401706.1962.10490022 1962

[53] [54]

Williams and Randall D

Paul L. Williams and Randall D. Beer. Nonnegative decomposition of multivariate information. CoRR, abs/1004.2515, 2010. URLhttp://arxiv.org/abs/1004.2515

Pith/arXiv arXiv 2010

[54] [55]

Taylor/kernel

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, Lecture Notes in Computer Science, pages 818–833. Springer, 2014. doi: ...

work page doi:10.1007/978-3-319-10590-1_ 2014

[55] [56]

Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S

Spectators contribute nothing, since Y does not depend on them. Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S. (10) Step 2: Möbius coefficients.From (10): •Empty set.v x(∅) = 1 2, som x(∅) = 1 2. •Singletons.For anyi∈[n],v x({i}) = 1 2, som x({i}) = 1 2 − 1 2 = 0. • Pairs.For any {i, j} ⊆[n] , vx({i, j}) =v x({i}) =v x({j}) =v x(∅) = 1 ...

[56] [57]

Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space

The standalone LOCO π(Xi) = 0 (any single triplet feature alone yields no information gain), soR(X i) = 0andS(X i) = 1 2. Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space. Notice that scalar indices project this interaction into ± ...

[57] [58]

meaningless

because they inherently average over contexts rather than isolating the synergistic extremum. The faithful family produces the zero scalar in R per pair; the projective family produces ± 1 4 in R per pair. These scalar reports do not carry the named decomposition into uniqueness, redundancy, and synergy. In the faithful case, the pair-level report erases ...

[58] [59]

Runtime checks:Continuously evaluate adjacency-dominance conditions during deploy- ment, flagging violations in real-time

[59] [60]

what is the best achievable loss using only the features in S?

Fallback policy:In case of A3 violations, revert to a conservative estimator that does not rely on adjacency-dominance. 3.Logging:Record all flagged violations and fallback activations for offline analysis. Section F.3 verifies that the boundary case (XOR with uniform background) violates this and that the synthetic third-order dataset satisfies it with a...

[60] [61]

Selection:Choose pbg based on domain-specific priors, ensuring it reflects the expected data-generating process

[61] [62]

Justification:Provide a rationale for the choice of pbg, supported by empirical or theoretical evidence

[62] [63]

Diagnostics:Evaluate sensitivity to pbg by comparing results across multiple plausible background distributions

[63] [64]

This protocol aims to improve transparency and reproducibility for interventional estimands

Reporting:Explicitly document the chosen pbg and any observed sensitivity in the experi- mental results. This protocol aims to improve transparency and reproducibility for interventional estimands. On E1, we compare uniform-binary and empirical-resampled backgrounds across 5 seeds per dataset. Across XOR3, XOR+AND, and Synth3, pooled absolute drift remain...