pith. sign in

arxiv: 2606.19410 · v1 · pith:3IHHFFPPnew · submitted 2026-06-17 · 📊 stat.ML · cs.LG

The Representational Limit of Scalar Interactions: An Interventional Decomposition

Pith reviewed 2026-06-26 18:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords interaction decompositionuniqueness redundancy synergyShapley interactionstructural causal modelmasked inferencepost-hoc interpretabilityXOR model
0
0 comments X

The pith

Scalar pairwise interaction scores mix uniqueness, redundancy, and synergy that cannot be separated from pairs alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that common pairwise interaction measures in machine learning cannot separate three different ways features can interact: unique contributions from one feature, redundant information shared across features, and synergistic effects that only appear when features combine. It demonstrates the mixing on a simple three-variable XOR causal model where some standard indices miss the interaction entirely while others smear the third-order effect across incorrect pair values. To fix this, the authors introduce Stochastic Hi-Fi, a retraining-free method that applies random feature masking and intervention to recover separate per-feature profiles for each of the three mechanisms. The estimator supplies exact interventional meaning, finite-sample bounds, and variance reduction, and it recovers the true structure on synthetic causal models while also separating interaction types inside GPT-2 and improving deletion metrics on chest X-ray classification.

Core claim

Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite

What carries the argument

Stochastic Hi-Fi, a post-hoc predictability decomposition that uses interventional masked inference to produce separate per-feature uniqueness, redundancy, and synergy profiles.

If this is right

  • Recovers structure missed by scalar baselines with up to 411 times larger interaction-magnitude recovery ratios on tabular structural causal models.
  • Separates redundant and synergistic heads inside the GPT-2 indirect-object-identification circuit.
  • Matches GradCAM performance on the Pointing Game while improving Deletion AUC on the NIH ChestX-ray14 dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same masking approach could be inserted into other post-hoc explanation pipelines to test whether their reported pairwise scores are actually conflating the three mechanisms.
  • In domains where synergy is expected, such as multi-modal fusion, the decomposition supplies a concrete way to quantify when joint effects exceed what any single feature supplies.
  • The finite-sample bounds open the possibility of statistical hypothesis tests that decide whether a detected interaction is unique, redundant, or synergistic at a chosen confidence level.

Load-bearing premise

Interventional masked inference can isolate uniqueness, redundancy, and synergy profiles in a manner faithful to the underlying data-generating process without requiring model-specific assumptions beyond the post-hoc predictability decomposition.

What would settle it

On the known 3-way XOR structural causal model, compute the true U/R/S contributions of each variable and check whether Stochastic Hi-Fi estimates deviate from those values by more than the stated Monte Carlo error bounds.

Figures

Figures reproduced from arXiv: 2606.19410 by Potito Aghilar, Sabino Roccotelli, Sebastiano Stramaglia, Stanislao Fidanza, Tommaso Di Noia, Vito Walter Anelli.

Figure 1
Figure 1. Figure 1: Left: Pair-level scalar indices on the 3-way XOR SCM: faithful indices collapse higher￾order signal to near-zero pair scores, while projective indices redistribute the triplet effect into small signed pair coefficients. Right: Per-feature U/R/S profile from Stochastic Hi-Fi (mean ± std over 5 seeds): the three active features are synergy-dominated, while the spectator is separated. Hi-Fi recovers both role… view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative ChestX-ray compari￾son on a Cardiomegaly case: original image with ground-truth box (white), Stochastic Hi￾Fi, GradCAM, Integrated Gradients, Vanilla Gradient, and Random. This panel is descrip￾tive and complements the quantitative local￾ization/deletion metrics. rather than uniformly positive: Pointing Game is non-significant (paired Wilcoxon on n = 220 image￾pairs after tie-removal; 660 ties … view at source ↗
Figure 3
Figure 3. Figure 3: IOI attention-head decomposition on the 10-head circuit. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Full 26-head IOI extension. Left: Pair interaction map Sij − Rij over 26 2  = 325 pairs (lower triangle), showing mixed synergistic and redundant structure across role families. Right: Per-head synergy S versus singleton LOCO π (activation-patching identity), highlighting heads with low first-order effect but substantial cooperative contribution. Spearman(π, patch) = 1.000 ± 0.000 holds by construction. H… view at source ↗
read the original abstract

Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that signed pairwise interaction scores conflate uniqueness (U), redundancy (R), and synergy (S), as shown by a proof on a minimal 3-way XOR structural causal model where faithful indices (e.g., Shapley-Taylor) return zero per pair while projective indices spread third-order effects. It introduces Stochastic Hi-Fi, a post-hoc retraining-free method that estimates per-feature U/R/S profiles via interventional masked inference, asserting exact interventional semantics, finite-sample Monte Carlo bounds, variance reduction via coupled diamond sampling, and uniform convergence. Empirical results show up to 411x larger interaction-magnitude recovery on tabular SCMs, separation of redundant/synergistic heads in the GPT-2 IOI circuit, and competitive performance with GradCAM on NIH ChestX-ray14 (Pointing Game and Deletion AUC).

Significance. If the decomposition is faithful, the work provides a concrete advance over scalar interaction indices by separating mechanisms that are otherwise mixed, with direct relevance to feature attribution and circuit analysis in ML. Strengths include the minimal-model proof establishing the conflation phenomenon, the explicit Monte Carlo estimator with variance-reduction technique, and the reproducible empirical comparisons on controlled SCMs.

major comments (3)
  1. [§3] §3 (Stochastic Hi-Fi definition and estimator): The central claim that interventional masked inference isolates U/R/S profiles with 'exact interventional semantics' and no model-specific assumptions beyond post-hoc predictability decomposition is load-bearing. No explicit argument is given that the masking operator commutes with the SCM's causal structure or prevents higher-order leakage under finite masking, which directly affects whether the recovered profiles on the 3-way XOR are guaranteed to match the structural mechanisms.
  2. [§4.2] §4.2 (finite-sample bounds): The abstract and method assert finite-sample Monte Carlo bounds and uniform finite-vocabulary convergence, yet the derivation is not shown; this is required to support the variance-reduction and convergence claims that underwrite the empirical recovery ratios.
  3. [§6.3] §6.3 and §7 (GPT-2 IOI and ChestX-ray14 experiments): The separation of redundant/synergistic heads and the medical imaging metrics are presented without ground-truth validation details for the assigned U/R/S labels, weakening the claim that the method recovers structure missed by scalar baselines in real models.
minor comments (2)
  1. [§3.3] Notation for the diamond sampling procedure could be clarified with an explicit pseudocode block to make the variance-reduction step reproducible from the text alone.
  2. [Abstract] The abstract states 'up to 411x larger interaction-magnitude recovery ratios' without specifying the exact baseline and metric in the summary sentence; a parenthetical reference to the relevant table would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. Below we address each of the major comments in detail, indicating the revisions we plan to make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Stochastic Hi-Fi definition and estimator): The central claim that interventional masked inference isolates U/R/S profiles with 'exact interventional semantics' and no model-specific assumptions beyond post-hoc predictability decomposition is load-bearing. No explicit argument is given that the masking operator commutes with the SCM's causal structure or prevents higher-order leakage under finite masking, which directly affects whether the recovered profiles on the 3-way XOR are guaranteed to match the structural mechanisms.

    Authors: We agree that an explicit argument for the commutation of the masking operator with the SCM structure would strengthen the central claim. In the revised manuscript, we will add a dedicated subsection in §3 providing a formal argument that the interventional masking isolates U, R, and S without higher-order leakage, leveraging the definition of the predictability decomposition and the finite masking sets used in the estimator. This will directly address the 3-way XOR case. revision: yes

  2. Referee: [§4.2] §4.2 (finite-sample bounds): The abstract and method assert finite-sample Monte Carlo bounds and uniform finite-vocabulary convergence, yet the derivation is not shown; this is required to support the variance-reduction and convergence claims that underwrite the empirical recovery ratios.

    Authors: The derivations for the finite-sample Monte Carlo bounds and uniform convergence are provided in the supplementary material. To make this more accessible, we will include a high-level sketch of the proof in §4.2 of the main text, highlighting the role of coupled diamond sampling in variance reduction and the conditions for uniform convergence over finite vocabularies. revision: yes

  3. Referee: [§6.3] §6.3 and §7 (GPT-2 IOI and ChestX-ray14 experiments): The separation of redundant/synergistic heads and the medical imaging metrics are presented without ground-truth validation details for the assigned U/R/S labels, weakening the claim that the method recovers structure missed by scalar baselines in real models.

    Authors: For the GPT-2 experiments, the U/R/S assignments are validated by their consistency with the established IOI circuit analysis in the literature, where certain heads are known to exhibit redundant or synergistic behavior based on ablation studies. For the ChestX-ray14, we rely on the standard evaluation protocols using Pointing Game and Deletion AUC, showing competitive or improved performance. We acknowledge that direct ground-truth labels for U/R/S are inherently unavailable in these complex models without full causal specification. We will add a discussion of this limitation and the reliance on comparative and literature-based validation in the revised §6.3 and §7. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on interventional definitions and SCM example, not self-referential reductions

full rationale

The paper defines Stochastic Hi-Fi directly via interventional masked inference and Monte Carlo estimation on a 3-way XOR SCM to separate U/R/S, with properties (exact semantics, variance bounds) following from the sampling procedure itself. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and the central decomposition is not equivalent to its inputs. The method is presented as post-hoc and retraining-free without parameter fitting that would force the reported profiles.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the 3-way XOR SCM for general conflation and on the interventional semantics of masked inference being sufficient to separate the three mechanisms.

axioms (2)
  • domain assumption The 3-way XOR structural causal model is a faithful minimal example that exposes the conflation in scalar indices.
    Invoked as the setting for the proof in the abstract.
  • domain assumption Interventional masked inference yields exact semantics for U/R/S decomposition.
    Stated as a property of the estimator.

pith-pipeline@v0.9.1-grok · 5719 in / 1354 out tokens · 30302 ms · 2026-06-26T18:52:04.556720+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 20 canonical work pages · 1 internal anchor

  1. [1]

    Openxai: Towards a transparent evaluation of model explanations

    Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, and Himabindu Lakkaraju. Openxai: Towards a transparent evaluation of model explanations. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conferenc...

  2. [2]

    Quantifying unique information.Entropy, 16(4):2161–2183, 2014

    Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, Jürgen Jost, and Nihat Ay. Quantifying unique information.Entropy, 16(4):2161–2183, 2014. doi: 10.3390/E16042161. URL https: //doi.org/10.3390/e16042161

  3. [3]

    Explaining graph neural net- works via structure-aware interaction index

    Ngoc Bui, Hieu Trung Nguyen, Viet Anh Nguyen, and Rex Ying. Explaining graph neural net- works via structure-aware interaction index. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty- first International Conference on Machine Learning, ICML 2024, Vienna, Austria,...

  4. [4]

    URLhttps://proceedings.mlr.press/v235/bui24b.html

  5. [5]

    Polynomial calculation of the Shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009

    Javier Castro, Daniel Gómez, and Juan Tejada. Polynomial calculation of the shapley value based on sampling.Comput. Oper. Res., 36(5):1726–1730, 2009. doi: 10.1016/J.COR.2008.04.004. URLhttps://doi.org/10.1016/j.cor.2008.04.004

  6. [6]

    Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso

    Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Info...

  7. [7]

    Lundberg, and Su-In Lee

    Ian Covert, Scott M. Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS ...

  8. [8]

    Diffusionpid: interpreting diffusion via partial information decomposition

    Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, and Yonatan Bisk. Diffusionpid: interpreting diffusion via partial information decomposition. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

  9. [9]

    Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Confere...

  10. [11]

    Kernelshap-iq: Weighted least square optimization for shapley interactions

    Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, and Barbara Hammer. Kernelshap-iq: Weighted least square optimization for shapley interactions. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine...

  11. [12]

    Grabisch, M

    Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among players in cooperative games.Int. J. Game Theory, 28(4):547–565, 1999. doi: 10.1007/ S001820050125. URLhttps://doi.org/10.1007/s001820050125

  12. [13]

    Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Woj- ciech Samek, Sebastian Lapuschkin, and Marina M.-C. Höhne. Quantus: An explainable AI toolkit for responsible evaluation of neural network explanations and beyond.J. Mach. Learn. Res., 24:34:1–34:11, 2023. URLhttps://jmlr.org/papers/v24/22-0142.html

  13. [14]

    Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models

    Tom Heskes, Evi Sijben, Ioan Gabriel Bucur, and Tom Claassen. Causal shapley val- ues: Exploiting causal knowledge to explain individual predictions of complex models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan- Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Con- ference on Neura...

  14. [15]

    A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948

    Wassily Hoeffding. A class of statistics with asymptotically normal distribution.The Annals of Mathematical Statistics, 19(3):293–325, 1948. ISSN 00034851. URL http://www.jstor. org/stable/2235637

  15. [16]

    Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

    Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. ISSN 01621459, 1537274X

  16. [17]

    Discovering additive structure in black box functions

    Giles Hooker. Discovering additive structure in black box functions. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors,Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004, pages 575–580. ACM, 2004. doi: 10.1145/1014052.1014122. URL https://d...

  17. [18]

    A benchmark for inter- pretability methods in deep neural networks

    Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for inter- pretability methods in deep neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing...

  18. [19]

    Weinberger

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2261–2269. IEEE Computer Society,

  19. [20]

    , year = 2017, month = jul, pages =

    doi: 10.1109/CVPR.2017.243. URLhttps://doi.org/10.1109/CVPR.2017.243

  20. [21]

    James, Jeffrey Emenheiser, and James P

    Ryan G. James, Jeffrey Emenheiser, and James P. Crutchfield. Unique information and secret key agreement.Entropy, 21(1):12, 2019. doi: 10.3390/E21010012. URL https://doi.org/ 10.3390/e21010012

  21. [22]

    Janizek, Pascal Sturmfels, and Su-In Lee

    Joseph D. Janizek, Pascal Sturmfels, and Su-In Lee. Explaining explanations: Axiomatic feature interactions for deep networks.J. Mach. Learn. Res., 22:104:1–104:54, 2021. URL https://jmlr.org/papers/v22/20-1223.html

  22. [23]

    Feature relevance quantification in explainable AI: A causal problem

    Dominik Janzing, Lenon Minorics, and Patrick Blöbaum. Feature relevance quantification in explainable AI: A causal problem. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research, ...

  23. [24]

    URLhttp://proceedings.mlr.press/v108/janzing20a.html

    PMLR, 2020. URLhttp://proceedings.mlr.press/v108/janzing20a.html

  24. [25]

    Zaletel, and Joel E

    Chaeyun Ko. STRIDE: subset-free functional decomposition for XAI in tabular settings.CoRR, abs/2509.09070, 2025. doi: 10.48550/ARXIV .2509.09070. URL https://doi.org/10. 48550/arXiv.2509.09070. 12

  25. [26]

    A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022

    Artemy Kolchinsky. A novel approach to the partial information decomposition.Entropy, 24 (3):403, 2022. doi: 10.3390/E24030403. URLhttps://doi.org/10.3390/e24030403

  26. [27]

    M., Kucukelbir, A

    Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Association, 113(523):1094–1111, 2018. doi: 10.1080/01621459.2017.1307116. URL https://doi.org/10.1080/01621459.2017.1307116

  27. [28]

    Kautz, and Chenliang Xu

    Samuel Lerman, Charles Venuto, Henry A. Kautz, and Chenliang Xu. Explaining local, global, and higher-order interactions in deep learning. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 1204–

  28. [29]
  29. [30]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017...

  30. [31]

    M., Erion, G., Chen, H., DeGrave, A., Prutkin, J

    Scott M. Lundberg, Gabriel G. Erion, Hugh Chen, Alex J. DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable AI for trees.Nat. Mach. Intell., 2(1):56–67, 2020. doi: 10.1038/S42256-019-0138-9. URLhttps://doi.org/10.1038/s42256-019-0138-9

  31. [32]

    SHAP meets tensor networks: Provably tractable explanations with parallelism

    Reda Marzouk, Shahaf Bassan, and Guy Katz. SHAP meets tensor networks: Provably tractable explanations with parallelism. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=FfccSikDfZ

  32. [33]

    H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers

    Ayushi Mehrotra, Dipkamal Bhusal, Michael Clifford, and Nidhi Rastogi. H-sets: Hessian- guided discovery of set-level feature interactions in image classifiers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. URL https://arxiv.org/abs/2604.22045. Accepted

  33. [34]

    Locating and editing factual associations in GPT

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novem...

  34. [35]

    shapiq: Shapley interactions for machine learning

    Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, and Eyke Hüllermeier. shapiq: Shapley interactions for machine learning. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Confere...

  35. [36]

    Roger B. Myerson. Graphs and cooperation in games.Math. Oper. Res., 2(3):225–229, 1977. doi: 10.1287/MOOR.2.3.225. URLhttps://doi.org/10.1287/moor.2.3.225

  36. [37]

    Cortes, Daniele Marinazzo, and Sebastiano Stramaglia

    Marlis Ontivero-Ortega, Luca Faes, Jesus M. Cortes, Daniele Marinazzo, and Sebastiano Stramaglia. Assessing high-order effects in feature importance via predictability decomposition. Phys. Rev. E, 111:L033301, Mar 2025. doi: 10.1103/PhysRevE.111.L033301. URL https: //link.aps.org/doi/10.1103/PhysRevE.111.L033301

  37. [38]

    Estimating the unique information of continuous variables

    Ari Pakman, Amin Nejatbakhsh, Dar Gilboa, Abdullah Makkeh, Luca Mazzucato, Michael Wibral, and Elad Schneidman. Estimating the unique information of continuous variables. In 13 Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wort- man Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Confer- ...

  38. [39]

    RISE: randomized input sampling for explanation of black-box models

    Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: randomized input sampling for explanation of black-box models. InBritish Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018, page 151. BMV A Press, 2018. URL http://bmvc2018.org/ contents/papers/1064.pdf

  39. [40]

    why should i trust you?

    Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi, editors,Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, ...

  40. [41]

    Lawrence Zitnick, and Devi Parikh

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InIEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 618–626. IEEE Computer Society, 2017. doi: 10.1109/ICCV .2017

  41. [42]

    URLhttps://doi.org/10.1109/ICCV.2017.74

  42. [43]

    Math and Comput in Simulation , year =

    I.M Sobol’. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates.Mathematics and Computers in Simulation, 55(1):271–280, 2001. ISSN 0378-4754. doi: https://doi.org/10.1016/S0378-4754(00)00270-6. The Second IMACS Seminar on Monte Carlo Methods

  43. [44]

    The many shapley values for model explanation

    Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9269–9278. PMLR,

  44. [45]

    URLhttp://proceedings.mlr.press/v119/sundararajan20b.html

  45. [46]

    Axiomatic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 2017. URL http://proceedings. mlr.press...

  46. [47]

    The shapley taylor interaction index

    Mukund Sundararajan, Kedar Dhamdhere, and Ashish Agarwal. The shapley taylor interaction index. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, pages 9259–9268. PMLR, 2020. URLhttp://proceedings.mlr.press/v119/sundararajan20a.html

  47. [48]

    Faith-shap: The faithful shapley interaction index.J

    Che-Ping Tsai, Chih-Kuan Yeh, and Pradeep Ravikumar. Faith-shap: The faithful shapley interaction index.J. Mach. Learn. Res., 24:94:1–94:42, 2023. URL https://jmlr.org/ papers/v24/22-0202.html

  48. [49]

    How does this interaction affect me? inter- pretable attribution for feature interactions

    Michael Tsang, Sirisha Rambhatla, and Yan Liu. How does this interaction affect me? inter- pretable attribution for feature interactions. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems ...

  49. [50]

    Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M. Shieber. Investigating gender bias in language models using causal mediation analysis. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual ...

  50. [51]

    Interpretability in the wild: a circuit for indirect object identification in GPT-2 small

    Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Stein- hardt. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum? id=NpsVSN6o4ul

  51. [52]

    Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly- supervised classification and localization of common thorax diseases. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages ...

  52. [53]

    B. P. Welford. Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3):419–420, 1962. doi: 10.1080/00401706.1962.10490022

  53. [54]

    Williams and Randall D

    Paul L. Williams and Randall D. Beer. Nonnegative decomposition of multivariate information. CoRR, abs/1004.2515, 2010. URLhttp://arxiv.org/abs/1004.2515

  54. [55]

    Taylor/kernel

    Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, Lecture Notes in Computer Science, pages 818–833. Springer, 2014. doi: ...

  55. [56]

    Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S

    Spectators contribute nothing, since Y does not depend on them. Therefore vx(S) = ( 1 2 if{1,2,3} ̸⊆S, ⊕(x1, x2, x3)if{1,2,3} ⊆S. (10) Step 2: Möbius coefficients.From (10): •Empty set.v x(∅) = 1 2, som x(∅) = 1 2. •Singletons.For anyi∈[n],v x({i}) = 1 2, som x({i}) = 1 2 − 1 2 = 0. • Pairs.For any {i, j} ⊆[n] , vx({i, j}) =v x({i}) =v x({j}) =v x(∅) = 1 ...

  56. [57]

    Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space

    The standalone LOCO π(Xi) = 0 (any single triplet feature alone yields no information gain), soR(X i) = 0andS(X i) = 1 2. Step 6: scalar conflation of the U/R/S components.The per-feature triple (U, R, S) = (0,0, 1 2)∈ R3 is identical for each i∈ {1,2,3} but lives in a 3-dimensional output space. Notice that scalar indices project this interaction into ± ...

  57. [58]

    meaningless

    because they inherently average over contexts rather than isolating the synergistic extremum. The faithful family produces the zero scalar in R per pair; the projective family produces ± 1 4 in R per pair. These scalar reports do not carry the named decomposition into uniqueness, redundancy, and synergy. In the faithful case, the pair-level report erases ...

  58. [59]

    Runtime checks:Continuously evaluate adjacency-dominance conditions during deploy- ment, flagging violations in real-time

  59. [60]

    what is the best achievable loss using only the features in S?

    Fallback policy:In case of A3 violations, revert to a conservative estimator that does not rely on adjacency-dominance. 3.Logging:Record all flagged violations and fallback activations for offline analysis. Section F.3 verifies that the boundary case (XOR with uniform background) violates this and that the synthetic third-order dataset satisfies it with a...

  60. [61]

    Selection:Choose pbg based on domain-specific priors, ensuring it reflects the expected data-generating process

  61. [62]

    Justification:Provide a rationale for the choice of pbg, supported by empirical or theoretical evidence

  62. [63]

    Diagnostics:Evaluate sensitivity to pbg by comparing results across multiple plausible background distributions

  63. [64]

    This protocol aims to improve transparency and reproducibility for interventional estimands

    Reporting:Explicitly document the chosen pbg and any observed sensitivity in the experi- mental results. This protocol aims to improve transparency and reproducibility for interventional estimands. On E1, we compare uniform-binary and empirical-resampled backgrounds across 5 seeds per dataset. Across XOR3, XOR+AND, and Synth3, pooled absolute drift remain...