The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

Bryan Arnold; David Rhoads; Drake Caraker

arxiv: 2605.21492 · v1 · pith:G2HQKYFGnew · submitted 2026-04-08 · 💻 cs.LG · cs.AI· cs.LO· stat.ML

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

Drake Caraker , Bryan Arnold , David Rhoads This is my paper

Pith reviewed 2026-05-22 01:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.LOstat.ML

keywords feature attributionexplainable AIcollinearityfeature rankingSHAPimpossibility theoremensemble methodsformal verification

0 comments

The pith

No feature ranking can be faithful, stable, and complete at once when features are collinear.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that when input features are collinear, meaning one is a perfect linear function of another, it is impossible for any feature attribution method to produce rankings that are simultaneously faithful to the underlying model, stable against small perturbations, and complete in explaining the full output. This matters because collinearity appears in most real datasets and feature rankings underpin explanations, fairness audits, and model diagnostics. The authors prove the result from sixteen axioms, show that rankings for collinear pairs reduce to a coin flip, and classify all methods into two families: faithful-complete approaches that flip rankings up to half the time and ensemble approaches that report ties for symmetric features.

Core claim

No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. The design space theorem shows exactly two families of methods exist: faithful-complete methods, which are unstable with rankings that flip up to 50 percent of the time, and ensemble methods like DASH, which are stable and report ties for symmetric features. The attribution ratio diverges as 1 over 1 minus rho squared for gradient boosting, is infinite for Lasso, and converges for random forests. DASH is provably Pareto-optimal among unbiased aggregations and achieves the Cramer-Rao variance bound.

What carries the argument

The design space theorem derived from sixteen axioms on faithful, stable, and complete attribution, which partitions every possible method into one of two families and forces coin-flip behavior on perfectly collinear pairs.

If this is right

SHAP-based proxy discrimination audits are provably unreliable when collinear features are present.
Switching to conditional SHAP does not escape the impossibility when features have equal causal effects.
68 percent of 77 public datasets exhibit attribution instability.
DASH achieves the Cramer-Rao lower bound on variance with a tight formula for ensemble size.
A Z-test workflow and single-model screening tool can diagnose instability in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners should first test for collinearity before relying on any ranking for high-stakes decisions.
Methods that explicitly report ties for collinear groups may become standard for stable explanations.
The same impossibility may apply to other attribution tasks such as local explanations or counterfactual generation under linear dependence.

Load-bearing premise

The sixteen formal axioms correctly capture what faithful, stable, and complete mean for real-world attribution requirements.

What would settle it

Discovery of a single attribution method that produces faithful, stable, and complete rankings on a dataset containing at least one pair of perfectly collinear features would falsify the impossibility.

Figures

Figures reproduced from arXiv: 2605.21492 by Bryan Arnold, David Rhoads, Drake Caraker.

**Figure 2.** Figure 2: Left: Within-group information lost by Dash (log2 m! bits per group, independent of M and ρ). Right: Between-group information per pair as a function of M. Dash sharpens between-group rankings with increasing M. At high ρ (≥ 0.9), convergence requires larger M. 5.5 Progressive DASH: Adaptive Ensemble Sizing Training M=25 models for every explanation is wasteful when most feature pairs are stable. Progressi… view at source ↗

**Figure 3.** Figure 3: The Attribution Design Space. Family A (single-model methods): faithful and complete but unstable (U=1/2). Family B (Dash ensemble): stable with ties (U=0, S increasing with M). The ideal point (S=1, U=0, C=complete) is infeasible. There is no third option. The single tradeoff axis. The Design Space Theorem collapses the three-dimensional space to a single axis: ensemble size M. M Stability S Unfaithfulnes… view at source ↗

**Figure 4.** Figure 4: Flip rate as a function of (ρ, ∆β). The impossibility region (dark) expands with ρ. At ρ ≥ 0.95, instability persists for all ∆β ≤ 1. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Left: Multi-model Z-test statistic vs. flip rate on Breast Cancer (r = −0.89). Pairs below z = 1.96 (dashed) have unreliable rankings. Right: Single-model screen split-frequency diagnostic (r = −0.78). Both diagnostics predict which pairs are unstable. Restricted-range robustness. The headline correlation r = −0.89 between Zjk and flip rate on Breast Cancer includes many “easy” pairs with Z ≫ 10 and flip r… view at source ↗

**Figure 6.** Figure 6: Universal SNR calibration across 6 datasets (1,325 feature pairs). The theoretical [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗

**Figure 7.** Figure 7: Empirical validation on synthetic Gaussian data ( [PITH_FULL_IMAGE:figures/full_fig_p038_7.png] view at source ↗

**Figure 8.** Figure 8: SHAP ranking instability on Breast Cancer. Highly correlated pairs with similar importance (e.g., [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗

**Figure 9.** Figure 9: Multi-model Z-test statistic vs. flip rate across four datasets (Breast Cancer, Ames, Digits, Adult). [PITH_FULL_IMAGE:figures/full_fig_p046_9.png] view at source ↗

read the original abstract

No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist -- faithful-complete methods (unstable, with rankings that flip up to 50% of the time) and ensemble methods like DASH (stable, reporting ties for symmetric features) -- and no method lies outside this dichotomy. The impossibility is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. DASH (Diversified Aggregation of SHAP) is provably Pareto-optimal among unbiased aggregations, achieving the Cramer-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibit attribution instability. Switching to conditional SHAP does not escape the impossibility when features have equal causal effects. The framework includes practical diagnostics -- a Z-test workflow and single-model screening tool -- and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The design space theorem, diagnostics, and impossibility are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry) -- to our knowledge, the first formally verified impossibility in explainable AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves no attribution method can be faithful, stable, and complete under collinearity and backs it with a machine-checked Lean proof plus a clean two-family split of the design space.

read the letter

The core result is that when features are collinear, no ranking can hit all three properties at once, so rankings for dependent pairs become coin flips. The authors split the entire space into faithful-complete methods that flip rankings up to half the time and stable ensemble methods like DASH that report ties instead. They quantify the effect for gradient boosting, Lasso, and random forests, and they show the divergence grows with correlation strength.

Referee Report

0 major / 3 minor

Summary. The paper proves that no feature attribution method can simultaneously satisfy faithfulness, stability, and completeness when input features are collinear; for any collinear pair the induced ranking is equivalent to a fair coin flip. It formally characterizes the entire attribution design space as consisting of exactly two families (faithful-complete methods that are unstable, versus ensemble methods such as DASH that are stable but report ties), quantifies the divergence of the attribution ratio for gradient boosting, Lasso, and random forests, shows that DASH is Pareto-optimal among unbiased aggregations and attains the Cramér-Rao bound, reports that 68 % of 77 public datasets exhibit attribution instability, and demonstrates that conditional SHAP does not evade the impossibility when causal effects are equal. All core claims are machine-checked in Lean 4 (305 theorems, 16 axioms, zero sorries).

Significance. If the result holds, the work is significant for explainable AI. The machine-checked proof of an impossibility result is a rare and valuable contribution; the explicit two-family design-space theorem and the Cramér-Rao optimality of DASH supply both theoretical clarity and a concrete, provably stable alternative. The 77-dataset survey and the fairness-auditing implications give the claims immediate practical relevance. The formal verification strengthens confidence that the impossibility is not an artifact of informal reasoning.

minor comments (3)

§4.2: the statement that 'DASH achieves the Cramér-Rao bound with a tight ensemble size formula' would benefit from an explicit cross-reference to the corresponding Lean theorem number so readers can locate the verified statement directly.
Figure 3 caption: the legend for the 'coin-flip' baseline is not visually distinct from the DASH curve at small sample sizes; a different line style or marker would improve readability.
The survey section reports 68 % instability but does not state the precise operational definition of 'instability' used to label a dataset (e.g., fraction of collinear pairs with ranking flip probability > 0.4). Adding this definition would aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, detailed summary of our contributions, and recommendation to accept. The recognition of the impossibility result, the two-family design-space characterization, the Cramér-Rao optimality of DASH, the empirical survey, and the Lean 4 formalization is appreciated and aligns with the manuscript's intent.

Circularity Check

0 steps flagged

No significant circularity; derivation is axiomatically self-contained

full rationale

The impossibility theorem and design-space dichotomy are derived directly from 16 stated axioms and mechanically verified via 305 Lean 4 theorems (0 sorry). The Cramer-Rao bound for DASH follows from the paper's own unbiased-aggregation assumptions as an independent consequence. No load-bearing step reduces to a fitted parameter, self-citation chain, or definitional renaming; the formal verification establishes independence from the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The formalization rests on 16 axioms whose specific content is not enumerated in the abstract; the central claim depends on domain definitions of attribution properties and collinearity rather than data-fitted constants or new postulated entities.

axioms (2)

domain assumption Formal definitions of faithful, stable, and complete feature attributions
These definitions are invoked to prove that no method can satisfy all three simultaneously under collinearity.
domain assumption Collinearity modeled as linear dependence that forces ranking to a coin flip
This modeling choice underpins the quantitative divergence results for gradient boosting, Lasso, and random forests.

pith-pipeline@v0.9.0 · 5821 in / 1469 out tokens · 67307 ms · 2026-05-22T01:34:34.294377+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 5 (Attribution Impossibility) ... no feature ranking can be simultaneously faithful, stable, and complete ... Rashomon property
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 37 (Symmetric Bayes Dichotomy) ... exactly two families

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

[1]

Proceedings of the National Academy of Sciences , volume=

Impossibility Theorems for Feature Attribution , author=. Proceedings of the National Academy of Sciences , volume=

work page
[2]

Big Data , volume=

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , author=. Big Data , volume=

work page
[3]

Innovations in Theoretical Computer Science (ITCS) , year=

Inherent trade-offs in the fair determination of risk scores , author=. Innovations in Theoretical Computer Science (ITCS) , year=

work page
[4]

On the failings of

Huang, Xuanxiang and Marques-Silva, Joao , journal=. On the failings of

work page
[5]

Journal of Machine Learning Research , volume=

Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set , author=. Journal of Machine Learning Research , volume=

work page
[6]

The Limits of

Rao, Shrisha , journal=. The Limits of

work page
[7]

International Conference on Machine Learning (ICML) , year=

Position: Amazing Things Come From Having Many Good Models , author=. International Conference on Machine Learning (ICML) , year=

work page
[8]

Advances in Neural Information Processing Systems , volume=

A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=

work page
[9]

1951 , publisher=

Social Choice and Individual Values , author=. 1951 , publisher=

work page 1951
[10]

Social Choice Theory in

Nipkow, Tobias , journal=. Social Choice Theory in

work page
[11]

Statistical Learning Theory in

Zhang, Yuanhe and Lee, Jason D and Liu, Fanghui , journal=. Statistical Learning Theory in. 2026 , url=

work page 2026
[12]

2024 , howpublished=

Regulation (. 2024 , howpublished=

work page 2024
[13]

Hwang, Aida and Bell, Andrew and Fonseca, Jose and others , booktitle=

work page
[14]

Advances in Neural Information Processing Systems , volume=

Full-Gradient Representation for Neural Network Visualization , author=. Advances in Neural Information Processing Systems , volume=

work page
[15]

Journal of Machine Learning Research , volume=

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , author=. Journal of Machine Learning Research , volume=

work page
[16]

Journal of Machine Learning Research , volume=

Underspecification Presents Challenges for Credibility in Modern Machine Learning , author=. Journal of Machine Learning Research , volume=

work page
[17]

International Conference on Machine Learning , year=

Provably Better Explanations with Optimized Aggregation of Feature Attributions , author=. International Conference on Machine Learning , year=

work page
[18]

Advances in Neural Information Processing Systems , year=

Probabilistic Stability Guarantees for Feature Attributions , author=. Advances in Neural Information Processing Systems , year=

work page
[19]

SSRN Working Paper , year=

Mathematical Foundations of Explainability , author=. SSRN Working Paper , year=

work page
[20]

Conference on Uncertainty in Artificial Intelligence , pages=

Equivalence and Synthesis of Causal Models , author=. Conference on Uncertainty in Artificial Intelligence , pages=

work page
[21]

2005 , publisher=

Testing Statistical Hypotheses , author=. 2005 , publisher=

work page 2005
[22]

2009 , publisher=

Introduction to Nonparametric Estimation , author=. 2009 , publisher=

work page 2009
[23]

Feature relevance quantification in explainable

Janzing, Dominik and Minorics, Lenon and Bl. Feature relevance quantification in explainable. International Conference on Artificial Intelligence and Statistics , pages=

work page
[24]

International Conference on Artificial Intelligence and Statistics , year=

From Shapley Values to Generalized Additive Models and back , author=. International Conference on Artificial Intelligence and Statistics , year=

work page
[25]

Advances in Neural Information Processing Systems , year=

But Are You Sure? An Uncertainty-Aware Perspective on Explainability , author=. Advances in Neural Information Processing Systems , year=

work page
[26]

Journal of Machine Learning Research , volume=

Explaining by Removing: A Unified Framework for Model Explanation , author=. Journal of Machine Learning Research , volume=

work page
[27]

Slack, Dylan and Hilgard, Sophie and Jia, Emily and Singh, Sameer and Lakkaraju, Himanshu , booktitle=. Fooling

work page
[28]

ACM/IMS Journal of Data Science , year=

On the Existence of Simpler Machine Learning Models , author=. ACM/IMS Journal of Data Science , year=

work page
[29]

Algorithms to estimate

Chen, Hugh and Covert, Ian C and Lundberg, Scott M and Lee, Su-In , journal=. Algorithms to estimate

work page
[30]

2011 , howpublished=

Supervisory Guidance on Model Risk Management (. 2011 , howpublished=

work page 2011
[31]

1946 , note=

A Structure Theory for the Class of Exponential Families , author=. 1946 , note=

work page 1946
[32]

International Journal of Data Science and Analytics , year=

General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models , author=. International Journal of Data Science and Analytics , year=

work page
[33]

Machine Learning , year=

Testing Conditional Independence in Supervised Learning Algorithms , author=. Machine Learning , year=

work page
[34]

Explaining black box decisions by

Mase, Masayoshi and Owen, Art B and Seiler, Benjamin , journal=. Explaining black box decisions by

work page
[35]

The many

Sundararajan, Mukund and Najmi, Amir , booktitle=. The many

work page
[36]

Problems with

Kumar, I Elizabeth and Venkatasubramanian, Suresh and Scheidegger, Carlos and Friedler, Sorelle , booktitle=. Problems with

work page
[37]

Machine Learning , volume=

Random Forests , author=. Machine Learning , volume=

work page
[38]

Annals of Statistics , volume=

Greedy Function Approximation: A Gradient Boosting Machine , author=. Annals of Statistics , volume=

work page
[39]

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=

``Why Should I Trust You?'': Explaining the Predictions of Any Classifier , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=

work page
[40]

Advances in Neural Information Processing Systems , year=

Unrestricted Permutation Forces Extrapolation , author=. Advances in Neural Information Processing Systems , year=

work page
[41]

Journal of Computational and Graphical Statistics , year=

Statistical Inference for Feature Rankings Under Model Multiplicity , author=. Journal of Computational and Graphical Statistics , year=

work page
[42]

Explaining individual predictions when features are dependent: More accurate approximations to

Aas, Kjersti and Jullum, Martin and L. Explaining individual predictions when features are dependent: More accurate approximations to. Artificial Intelligence , volume=

work page
[43]

Asymmetric

Frye, Christopher and Rowat, Colin and Feige, Ilya , booktitle=. Asymmetric

work page
[44]

Wright, Marvin N and Watson, David S and K. Do. Statistics and Computing , year=

work page
[45]

Contributions to the Theory of Games II , series=

A Value for N-Person Games , author=. Contributions to the Theory of Games II , series=. 1953 , publisher=

work page 1953
[46]

de Moura, Leonardo and Ullrich, Sebastian , booktitle=. The

work page
[47]

Certified Programs and Proofs (CPP) , year=

The. Certified Programs and Proofs (CPP) , year=

work page
[48]

Ghorbani, Amirata and Zou, James , booktitle=. Data

work page
[50]

K. J. Arrow. Social Choice and Individual Values. Wiley, 1951

work page 1951
[51]

Bilodeau, N

B. Bilodeau, N. Jaques, P. W. Koh, and B. Kim. Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121 0 (2): 0 e2304406120, 2024

work page 2024
[52]

L. Breiman. Random forests. Machine Learning, 45: 0 5--32, 2001

work page 2001
[53]

Chouldechova

A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5 0 (2): 0 153--163, 2017

work page 2017
[54]

D'Amour, K

A. D'Amour, K. Heller, D. Moldovan, et al. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23 0 (226): 0 1--61, 2022

work page 2022
[55]

de Moura and S

L. de Moura and S. Ullrich. The L ean 4 theorem prover and programming language. In International Conference on Automated Deduction (CADE), 2021

work page 2021
[56]

Decker, L.-M

T. Decker, L.-M. Herm, R. Feldhans, M. Kamp, and B. Hammer. Provably better explanations with optimized aggregation of feature attributions. In International Conference on Machine Learning, 2024

work page 2024
[57]

Regulation ( EU ) 2024/1689 (artificial intelligence act)

EU Parliament . Regulation ( EU ) 2024/1689 (artificial intelligence act). Official Journal of the European Union, OJ L, 12.7.2024, 2024

work page 2024
[58]

Fisher, C

A. Fisher, C. Rudin, and F. Dominici. All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20 0 (177): 0 1--81, 2019

work page 2019
[59]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29 0 (5): 0 1189--1232, 2001

work page 2001
[60]

Ghorbani and J

A. Ghorbani and J. Zou. Data S hapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, 2019

work page 2019
[61]

Herren and P

A. Herren and P. R. Hahn. Statistical inference for feature rankings under model multiplicity. Journal of Computational and Graphical Statistics, 2023

work page 2023
[62]

Hooker, L

G. Hooker, L. Mentch, and S. Zhou. Unrestricted permutation forces extrapolation. In Advances in Neural Information Processing Systems, 2021

work page 2021
[63]

Huang and J

X. Huang and J. Marques-Silva. On the failings of S hapley values for explainability. International Journal of Approximate Reasoning, 171: 0 109112, 2024

work page 2024
[64]

Hunt and C

G. Hunt and C. Stein. A structure theory for the class of exponential families. 1946. Unpublished manuscript; results presented in Lehmann & Romano (2005), Ch. 6

work page 1946
[65]

Y. Jin, R. Gao, and B. Kim. Probabilistic stability guarantees for feature attributions. In Advances in Neural Information Processing Systems, 2025

work page 2025
[66]

Kleinberg, S

J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science (ITCS), 2017

work page 2017
[67]

Krishna, T

S. Krishna, T. Han, A. Ber, J. Brinton, A. Lakshminarayanan, C. Robinson, and H. Lakkaraju. The disagreement problem in explainable machine learning: A practitioner's perspective. arXiv preprint arXiv:2202.01602, 2022

work page arXiv 2022
[68]

Laberge, Y

G. Laberge, Y. Pequignot, A. Mathieu, F. Khomh, and M. Marchand. Partial order in chaos: Consensus on feature attributions in the rashomon set. Journal of Machine Learning Research, 24 0 (364): 0 1--50, 2023

work page 2023
[69]

E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 3rd edition, 2005

work page 2005
[70]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[71]

T. Nipkow. Social choice theory in HOL : Arrow and G ibbard- S atterthwaite. Journal of Automated Reasoning, 43: 0 289--304, 2009

work page 2009
[72]

Noguer i Alonso

M. Noguer i Alonso. Mathematical foundations of explainability. SSRN Working Paper, 2025

work page 2025
[73]

Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12)

Office of the Comptroller of the Currency . Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12). OCC Bulletin 2011-12, 2011

work page 2011
[74]

S. Rao. The limits of AI explainability: An algorithmic information theory approach. arXiv preprint arXiv:2504.20676, 2025

work page arXiv 2025
[75]

M. T. Ribeiro, S. Singh, and C. Guestrin. ``why should i trust you?'': Explaining the predictions of any classifier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

work page 2016
[76]

Rudin et al

C. Rudin et al. Position: Amazing things come from having many good models. In International Conference on Machine Learning (ICML), 2024

work page 2024
[77]

L. S. Shapley. A value for n-person games. In Contributions to the Theory of Games II, number 28 in Annals of Mathematics Studies, pages 307--317. Princeton University Press, 1953

work page 1953
[78]

Srinivas and F

S. Srinivas and F. Fleuret. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[79]

The L ean mathematical library

The mathlib Community . The L ean mathematical library. In Certified Programs and Proofs (CPP), 2020

work page 2020
[80]

A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009

work page 2009
[81]

Zhang, J

Y. Zhang, J. D. Lee, and F. Liu. Statistical learning theory in L ean 4: Empirical processes from scratch. arXiv preprint arXiv:2602.02285, 2026. URL https://arxiv.org/abs/2602.02285

work page arXiv 2026

[1] [1]

Proceedings of the National Academy of Sciences , volume=

Impossibility Theorems for Feature Attribution , author=. Proceedings of the National Academy of Sciences , volume=

work page

[2] [2]

Big Data , volume=

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , author=. Big Data , volume=

work page

[3] [3]

Innovations in Theoretical Computer Science (ITCS) , year=

Inherent trade-offs in the fair determination of risk scores , author=. Innovations in Theoretical Computer Science (ITCS) , year=

work page

[4] [4]

On the failings of

Huang, Xuanxiang and Marques-Silva, Joao , journal=. On the failings of

work page

[5] [5]

Journal of Machine Learning Research , volume=

Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set , author=. Journal of Machine Learning Research , volume=

work page

[6] [6]

The Limits of

Rao, Shrisha , journal=. The Limits of

work page

[7] [7]

International Conference on Machine Learning (ICML) , year=

Position: Amazing Things Come From Having Many Good Models , author=. International Conference on Machine Learning (ICML) , year=

work page

[8] [8]

Advances in Neural Information Processing Systems , volume=

A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=

work page

[9] [9]

1951 , publisher=

Social Choice and Individual Values , author=. 1951 , publisher=

work page 1951

[10] [10]

Social Choice Theory in

Nipkow, Tobias , journal=. Social Choice Theory in

work page

[11] [11]

Statistical Learning Theory in

Zhang, Yuanhe and Lee, Jason D and Liu, Fanghui , journal=. Statistical Learning Theory in. 2026 , url=

work page 2026

[12] [12]

2024 , howpublished=

Regulation (. 2024 , howpublished=

work page 2024

[13] [13]

Hwang, Aida and Bell, Andrew and Fonseca, Jose and others , booktitle=

work page

[14] [14]

Advances in Neural Information Processing Systems , volume=

Full-Gradient Representation for Neural Network Visualization , author=. Advances in Neural Information Processing Systems , volume=

work page

[15] [15]

Journal of Machine Learning Research , volume=

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , author=. Journal of Machine Learning Research , volume=

work page

[16] [16]

Journal of Machine Learning Research , volume=

Underspecification Presents Challenges for Credibility in Modern Machine Learning , author=. Journal of Machine Learning Research , volume=

work page

[17] [17]

International Conference on Machine Learning , year=

Provably Better Explanations with Optimized Aggregation of Feature Attributions , author=. International Conference on Machine Learning , year=

work page

[18] [18]

Advances in Neural Information Processing Systems , year=

Probabilistic Stability Guarantees for Feature Attributions , author=. Advances in Neural Information Processing Systems , year=

work page

[19] [19]

SSRN Working Paper , year=

Mathematical Foundations of Explainability , author=. SSRN Working Paper , year=

work page

[20] [20]

Conference on Uncertainty in Artificial Intelligence , pages=

Equivalence and Synthesis of Causal Models , author=. Conference on Uncertainty in Artificial Intelligence , pages=

work page

[21] [21]

2005 , publisher=

Testing Statistical Hypotheses , author=. 2005 , publisher=

work page 2005

[22] [22]

2009 , publisher=

Introduction to Nonparametric Estimation , author=. 2009 , publisher=

work page 2009

[23] [23]

Feature relevance quantification in explainable

Janzing, Dominik and Minorics, Lenon and Bl. Feature relevance quantification in explainable. International Conference on Artificial Intelligence and Statistics , pages=

work page

[24] [24]

International Conference on Artificial Intelligence and Statistics , year=

From Shapley Values to Generalized Additive Models and back , author=. International Conference on Artificial Intelligence and Statistics , year=

work page

[25] [25]

Advances in Neural Information Processing Systems , year=

But Are You Sure? An Uncertainty-Aware Perspective on Explainability , author=. Advances in Neural Information Processing Systems , year=

work page

[26] [26]

Journal of Machine Learning Research , volume=

Explaining by Removing: A Unified Framework for Model Explanation , author=. Journal of Machine Learning Research , volume=

work page

[27] [27]

Slack, Dylan and Hilgard, Sophie and Jia, Emily and Singh, Sameer and Lakkaraju, Himanshu , booktitle=. Fooling

work page

[28] [28]

ACM/IMS Journal of Data Science , year=

On the Existence of Simpler Machine Learning Models , author=. ACM/IMS Journal of Data Science , year=

work page

[29] [29]

Algorithms to estimate

Chen, Hugh and Covert, Ian C and Lundberg, Scott M and Lee, Su-In , journal=. Algorithms to estimate

work page

[30] [30]

2011 , howpublished=

Supervisory Guidance on Model Risk Management (. 2011 , howpublished=

work page 2011

[31] [31]

1946 , note=

A Structure Theory for the Class of Exponential Families , author=. 1946 , note=

work page 1946

[32] [32]

International Journal of Data Science and Analytics , year=

General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models , author=. International Journal of Data Science and Analytics , year=

work page

[33] [33]

Machine Learning , year=

Testing Conditional Independence in Supervised Learning Algorithms , author=. Machine Learning , year=

work page

[34] [34]

Explaining black box decisions by

Mase, Masayoshi and Owen, Art B and Seiler, Benjamin , journal=. Explaining black box decisions by

work page

[35] [35]

The many

Sundararajan, Mukund and Najmi, Amir , booktitle=. The many

work page

[36] [36]

Problems with

Kumar, I Elizabeth and Venkatasubramanian, Suresh and Scheidegger, Carlos and Friedler, Sorelle , booktitle=. Problems with

work page

[37] [37]

Machine Learning , volume=

Random Forests , author=. Machine Learning , volume=

work page

[38] [38]

Annals of Statistics , volume=

Greedy Function Approximation: A Gradient Boosting Machine , author=. Annals of Statistics , volume=

work page

[39] [39]

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=

``Why Should I Trust You?'': Explaining the Predictions of Any Classifier , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=

work page

[40] [40]

Advances in Neural Information Processing Systems , year=

Unrestricted Permutation Forces Extrapolation , author=. Advances in Neural Information Processing Systems , year=

work page

[41] [41]

Journal of Computational and Graphical Statistics , year=

Statistical Inference for Feature Rankings Under Model Multiplicity , author=. Journal of Computational and Graphical Statistics , year=

work page

[42] [42]

Explaining individual predictions when features are dependent: More accurate approximations to

Aas, Kjersti and Jullum, Martin and L. Explaining individual predictions when features are dependent: More accurate approximations to. Artificial Intelligence , volume=

work page

[43] [43]

Asymmetric

Frye, Christopher and Rowat, Colin and Feige, Ilya , booktitle=. Asymmetric

work page

[44] [44]

Wright, Marvin N and Watson, David S and K. Do. Statistics and Computing , year=

work page

[45] [45]

Contributions to the Theory of Games II , series=

A Value for N-Person Games , author=. Contributions to the Theory of Games II , series=. 1953 , publisher=

work page 1953

[46] [46]

de Moura, Leonardo and Ullrich, Sebastian , booktitle=. The

work page

[47] [47]

Certified Programs and Proofs (CPP) , year=

The. Certified Programs and Proofs (CPP) , year=

work page

[48] [48]

Ghorbani, Amirata and Zou, James , booktitle=. Data

work page

[49] [50]

K. J. Arrow. Social Choice and Individual Values. Wiley, 1951

work page 1951

[50] [51]

Bilodeau, N

B. Bilodeau, N. Jaques, P. W. Koh, and B. Kim. Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121 0 (2): 0 e2304406120, 2024

work page 2024

[51] [52]

L. Breiman. Random forests. Machine Learning, 45: 0 5--32, 2001

work page 2001

[52] [53]

Chouldechova

A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5 0 (2): 0 153--163, 2017

work page 2017

[53] [54]

D'Amour, K

A. D'Amour, K. Heller, D. Moldovan, et al. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23 0 (226): 0 1--61, 2022

work page 2022

[54] [55]

de Moura and S

L. de Moura and S. Ullrich. The L ean 4 theorem prover and programming language. In International Conference on Automated Deduction (CADE), 2021

work page 2021

[55] [56]

Decker, L.-M

T. Decker, L.-M. Herm, R. Feldhans, M. Kamp, and B. Hammer. Provably better explanations with optimized aggregation of feature attributions. In International Conference on Machine Learning, 2024

work page 2024

[56] [57]

Regulation ( EU ) 2024/1689 (artificial intelligence act)

EU Parliament . Regulation ( EU ) 2024/1689 (artificial intelligence act). Official Journal of the European Union, OJ L, 12.7.2024, 2024

work page 2024

[57] [58]

Fisher, C

A. Fisher, C. Rudin, and F. Dominici. All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20 0 (177): 0 1--81, 2019

work page 2019

[58] [59]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29 0 (5): 0 1189--1232, 2001

work page 2001

[59] [60]

Ghorbani and J

A. Ghorbani and J. Zou. Data S hapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, 2019

work page 2019

[60] [61]

Herren and P

A. Herren and P. R. Hahn. Statistical inference for feature rankings under model multiplicity. Journal of Computational and Graphical Statistics, 2023

work page 2023

[61] [62]

Hooker, L

G. Hooker, L. Mentch, and S. Zhou. Unrestricted permutation forces extrapolation. In Advances in Neural Information Processing Systems, 2021

work page 2021

[62] [63]

Huang and J

X. Huang and J. Marques-Silva. On the failings of S hapley values for explainability. International Journal of Approximate Reasoning, 171: 0 109112, 2024

work page 2024

[63] [64]

Hunt and C

G. Hunt and C. Stein. A structure theory for the class of exponential families. 1946. Unpublished manuscript; results presented in Lehmann & Romano (2005), Ch. 6

work page 1946

[64] [65]

Y. Jin, R. Gao, and B. Kim. Probabilistic stability guarantees for feature attributions. In Advances in Neural Information Processing Systems, 2025

work page 2025

[65] [66]

Kleinberg, S

J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science (ITCS), 2017

work page 2017

[66] [67]

Krishna, T

S. Krishna, T. Han, A. Ber, J. Brinton, A. Lakshminarayanan, C. Robinson, and H. Lakkaraju. The disagreement problem in explainable machine learning: A practitioner's perspective. arXiv preprint arXiv:2202.01602, 2022

work page arXiv 2022

[67] [68]

Laberge, Y

G. Laberge, Y. Pequignot, A. Mathieu, F. Khomh, and M. Marchand. Partial order in chaos: Consensus on feature attributions in the rashomon set. Journal of Machine Learning Research, 24 0 (364): 0 1--50, 2023

work page 2023

[68] [69]

E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 3rd edition, 2005

work page 2005

[69] [70]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017

[70] [71]

T. Nipkow. Social choice theory in HOL : Arrow and G ibbard- S atterthwaite. Journal of Automated Reasoning, 43: 0 289--304, 2009

work page 2009

[71] [72]

Noguer i Alonso

M. Noguer i Alonso. Mathematical foundations of explainability. SSRN Working Paper, 2025

work page 2025

[72] [73]

Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12)

Office of the Comptroller of the Currency . Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12). OCC Bulletin 2011-12, 2011

work page 2011

[73] [74]

S. Rao. The limits of AI explainability: An algorithmic information theory approach. arXiv preprint arXiv:2504.20676, 2025

work page arXiv 2025

[74] [75]

M. T. Ribeiro, S. Singh, and C. Guestrin. ``why should i trust you?'': Explaining the predictions of any classifier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

work page 2016

[75] [76]

Rudin et al

C. Rudin et al. Position: Amazing things come from having many good models. In International Conference on Machine Learning (ICML), 2024

work page 2024

[76] [77]

L. S. Shapley. A value for n-person games. In Contributions to the Theory of Games II, number 28 in Annals of Mathematics Studies, pages 307--317. Princeton University Press, 1953

work page 1953

[77] [78]

Srinivas and F

S. Srinivas and F. Fleuret. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[78] [79]

The L ean mathematical library

The mathlib Community . The L ean mathematical library. In Certified Programs and Proofs (CPP), 2020

work page 2020

[79] [80]

A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009

work page 2009

[80] [81]

Zhang, J

Y. Zhang, J. D. Lee, and F. Liu. Statistical learning theory in L ean 4: Empirical processes from scratch. arXiv preprint arXiv:2602.02285, 2026. URL https://arxiv.org/abs/2602.02285

work page arXiv 2026