The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
Pith reviewed 2026-05-22 01:34 UTC · model grok-4.3
The pith
No feature ranking can be faithful, stable, and complete at once when features are collinear.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. The design space theorem shows exactly two families of methods exist: faithful-complete methods, which are unstable with rankings that flip up to 50 percent of the time, and ensemble methods like DASH, which are stable and report ties for symmetric features. The attribution ratio diverges as 1 over 1 minus rho squared for gradient boosting, is infinite for Lasso, and converges for random forests. DASH is provably Pareto-optimal among unbiased aggregations and achieves the Cramer-Rao variance bound.
What carries the argument
The design space theorem derived from sixteen axioms on faithful, stable, and complete attribution, which partitions every possible method into one of two families and forces coin-flip behavior on perfectly collinear pairs.
If this is right
- SHAP-based proxy discrimination audits are provably unreliable when collinear features are present.
- Switching to conditional SHAP does not escape the impossibility when features have equal causal effects.
- 68 percent of 77 public datasets exhibit attribution instability.
- DASH achieves the Cramer-Rao lower bound on variance with a tight formula for ensemble size.
- A Z-test workflow and single-model screening tool can diagnose instability in practice.
Where Pith is reading between the lines
- Practitioners should first test for collinearity before relying on any ranking for high-stakes decisions.
- Methods that explicitly report ties for collinear groups may become standard for stable explanations.
- The same impossibility may apply to other attribution tasks such as local explanations or counterfactual generation under linear dependence.
Load-bearing premise
The sixteen formal axioms correctly capture what faithful, stable, and complete mean for real-world attribution requirements.
What would settle it
Discovery of a single attribution method that produces faithful, stable, and complete rankings on a dataset containing at least one pair of perfectly collinear features would falsify the impossibility.
Figures
read the original abstract
No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist -- faithful-complete methods (unstable, with rankings that flip up to 50% of the time) and ensemble methods like DASH (stable, reporting ties for symmetric features) -- and no method lies outside this dichotomy. The impossibility is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. DASH (Diversified Aggregation of SHAP) is provably Pareto-optimal among unbiased aggregations, achieving the Cramer-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibit attribution instability. Switching to conditional SHAP does not escape the impossibility when features have equal causal effects. The framework includes practical diagnostics -- a Z-test workflow and single-model screening tool -- and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The design space theorem, diagnostics, and impossibility are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry) -- to our knowledge, the first formally verified impossibility in explainable AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves that no feature attribution method can simultaneously satisfy faithfulness, stability, and completeness when input features are collinear; for any collinear pair the induced ranking is equivalent to a fair coin flip. It formally characterizes the entire attribution design space as consisting of exactly two families (faithful-complete methods that are unstable, versus ensemble methods such as DASH that are stable but report ties), quantifies the divergence of the attribution ratio for gradient boosting, Lasso, and random forests, shows that DASH is Pareto-optimal among unbiased aggregations and attains the Cramér-Rao bound, reports that 68 % of 77 public datasets exhibit attribution instability, and demonstrates that conditional SHAP does not evade the impossibility when causal effects are equal. All core claims are machine-checked in Lean 4 (305 theorems, 16 axioms, zero sorries).
Significance. If the result holds, the work is significant for explainable AI. The machine-checked proof of an impossibility result is a rare and valuable contribution; the explicit two-family design-space theorem and the Cramér-Rao optimality of DASH supply both theoretical clarity and a concrete, provably stable alternative. The 77-dataset survey and the fairness-auditing implications give the claims immediate practical relevance. The formal verification strengthens confidence that the impossibility is not an artifact of informal reasoning.
minor comments (3)
- §4.2: the statement that 'DASH achieves the Cramér-Rao bound with a tight ensemble size formula' would benefit from an explicit cross-reference to the corresponding Lean theorem number so readers can locate the verified statement directly.
- Figure 3 caption: the legend for the 'coin-flip' baseline is not visually distinct from the DASH curve at small sample sizes; a different line style or marker would improve readability.
- The survey section reports 68 % instability but does not state the precise operational definition of 'instability' used to label a dataset (e.g., fraction of collinear pairs with ranking flip probability > 0.4). Adding this definition would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive assessment, detailed summary of our contributions, and recommendation to accept. The recognition of the impossibility result, the two-family design-space characterization, the Cramér-Rao optimality of DASH, the empirical survey, and the Lean 4 formalization is appreciated and aligns with the manuscript's intent.
Circularity Check
No significant circularity; derivation is axiomatically self-contained
full rationale
The impossibility theorem and design-space dichotomy are derived directly from 16 stated axioms and mechanically verified via 305 Lean 4 theorems (0 sorry). The Cramer-Rao bound for DASH follows from the paper's own unbiased-aggregation assumptions as an independent consequence. No load-bearing step reduces to a fitted parameter, self-citation chain, or definitional renaming; the formal verification establishes independence from the target claims.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Formal definitions of faithful, stable, and complete feature attributions
- domain assumption Collinearity modeled as linear dependence that forces ranking to a coin flip
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Theorem 5 (Attribution Impossibility) ... no feature ranking can be simultaneously faithful, stable, and complete ... Rashomon property
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Theorem 37 (Symmetric Bayes Dichotomy) ... exactly two families
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the National Academy of Sciences , volume=
Impossibility Theorems for Feature Attribution , author=. Proceedings of the National Academy of Sciences , volume=
-
[2]
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , author=. Big Data , volume=
-
[3]
Innovations in Theoretical Computer Science (ITCS) , year=
Inherent trade-offs in the fair determination of risk scores , author=. Innovations in Theoretical Computer Science (ITCS) , year=
-
[4]
Huang, Xuanxiang and Marques-Silva, Joao , journal=. On the failings of
-
[5]
Journal of Machine Learning Research , volume=
Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set , author=. Journal of Machine Learning Research , volume=
- [6]
-
[7]
International Conference on Machine Learning (ICML) , year=
Position: Amazing Things Come From Having Many Good Models , author=. International Conference on Machine Learning (ICML) , year=
-
[8]
Advances in Neural Information Processing Systems , volume=
A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=
- [9]
- [10]
-
[11]
Statistical Learning Theory in
Zhang, Yuanhe and Lee, Jason D and Liu, Fanghui , journal=. Statistical Learning Theory in. 2026 , url=
work page 2026
- [12]
-
[13]
Hwang, Aida and Bell, Andrew and Fonseca, Jose and others , booktitle=
-
[14]
Advances in Neural Information Processing Systems , volume=
Full-Gradient Representation for Neural Network Visualization , author=. Advances in Neural Information Processing Systems , volume=
-
[15]
Journal of Machine Learning Research , volume=
All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , author=. Journal of Machine Learning Research , volume=
-
[16]
Journal of Machine Learning Research , volume=
Underspecification Presents Challenges for Credibility in Modern Machine Learning , author=. Journal of Machine Learning Research , volume=
-
[17]
International Conference on Machine Learning , year=
Provably Better Explanations with Optimized Aggregation of Feature Attributions , author=. International Conference on Machine Learning , year=
-
[18]
Advances in Neural Information Processing Systems , year=
Probabilistic Stability Guarantees for Feature Attributions , author=. Advances in Neural Information Processing Systems , year=
-
[19]
Mathematical Foundations of Explainability , author=. SSRN Working Paper , year=
-
[20]
Conference on Uncertainty in Artificial Intelligence , pages=
Equivalence and Synthesis of Causal Models , author=. Conference on Uncertainty in Artificial Intelligence , pages=
- [21]
-
[22]
Introduction to Nonparametric Estimation , author=. 2009 , publisher=
work page 2009
-
[23]
Feature relevance quantification in explainable
Janzing, Dominik and Minorics, Lenon and Bl. Feature relevance quantification in explainable. International Conference on Artificial Intelligence and Statistics , pages=
-
[24]
International Conference on Artificial Intelligence and Statistics , year=
From Shapley Values to Generalized Additive Models and back , author=. International Conference on Artificial Intelligence and Statistics , year=
-
[25]
Advances in Neural Information Processing Systems , year=
But Are You Sure? An Uncertainty-Aware Perspective on Explainability , author=. Advances in Neural Information Processing Systems , year=
-
[26]
Journal of Machine Learning Research , volume=
Explaining by Removing: A Unified Framework for Model Explanation , author=. Journal of Machine Learning Research , volume=
-
[27]
Slack, Dylan and Hilgard, Sophie and Jia, Emily and Singh, Sameer and Lakkaraju, Himanshu , booktitle=. Fooling
-
[28]
ACM/IMS Journal of Data Science , year=
On the Existence of Simpler Machine Learning Models , author=. ACM/IMS Journal of Data Science , year=
-
[29]
Chen, Hugh and Covert, Ian C and Lundberg, Scott M and Lee, Su-In , journal=. Algorithms to estimate
-
[30]
Supervisory Guidance on Model Risk Management (. 2011 , howpublished=
work page 2011
-
[31]
A Structure Theory for the Class of Exponential Families , author=. 1946 , note=
work page 1946
-
[32]
International Journal of Data Science and Analytics , year=
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models , author=. International Journal of Data Science and Analytics , year=
-
[33]
Testing Conditional Independence in Supervised Learning Algorithms , author=. Machine Learning , year=
-
[34]
Explaining black box decisions by
Mase, Masayoshi and Owen, Art B and Seiler, Benjamin , journal=. Explaining black box decisions by
- [35]
-
[36]
Kumar, I Elizabeth and Venkatasubramanian, Suresh and Scheidegger, Carlos and Friedler, Sorelle , booktitle=. Problems with
- [37]
-
[38]
Annals of Statistics , volume=
Greedy Function Approximation: A Gradient Boosting Machine , author=. Annals of Statistics , volume=
-
[39]
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=
``Why Should I Trust You?'': Explaining the Predictions of Any Classifier , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year=
-
[40]
Advances in Neural Information Processing Systems , year=
Unrestricted Permutation Forces Extrapolation , author=. Advances in Neural Information Processing Systems , year=
-
[41]
Journal of Computational and Graphical Statistics , year=
Statistical Inference for Feature Rankings Under Model Multiplicity , author=. Journal of Computational and Graphical Statistics , year=
-
[42]
Explaining individual predictions when features are dependent: More accurate approximations to
Aas, Kjersti and Jullum, Martin and L. Explaining individual predictions when features are dependent: More accurate approximations to. Artificial Intelligence , volume=
- [43]
-
[44]
Wright, Marvin N and Watson, David S and K. Do. Statistics and Computing , year=
-
[45]
Contributions to the Theory of Games II , series=
A Value for N-Person Games , author=. Contributions to the Theory of Games II , series=. 1953 , publisher=
work page 1953
-
[46]
de Moura, Leonardo and Ullrich, Sebastian , booktitle=. The
-
[47]
Certified Programs and Proofs (CPP) , year=
The. Certified Programs and Proofs (CPP) , year=
-
[48]
Ghorbani, Amirata and Zou, James , booktitle=. Data
-
[50]
K. J. Arrow. Social Choice and Individual Values. Wiley, 1951
work page 1951
-
[51]
B. Bilodeau, N. Jaques, P. W. Koh, and B. Kim. Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121 0 (2): 0 e2304406120, 2024
work page 2024
-
[52]
L. Breiman. Random forests. Machine Learning, 45: 0 5--32, 2001
work page 2001
-
[53]
A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5 0 (2): 0 153--163, 2017
work page 2017
-
[54]
A. D'Amour, K. Heller, D. Moldovan, et al. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23 0 (226): 0 1--61, 2022
work page 2022
-
[55]
L. de Moura and S. Ullrich. The L ean 4 theorem prover and programming language. In International Conference on Automated Deduction (CADE), 2021
work page 2021
-
[56]
T. Decker, L.-M. Herm, R. Feldhans, M. Kamp, and B. Hammer. Provably better explanations with optimized aggregation of feature attributions. In International Conference on Machine Learning, 2024
work page 2024
-
[57]
Regulation ( EU ) 2024/1689 (artificial intelligence act)
EU Parliament . Regulation ( EU ) 2024/1689 (artificial intelligence act). Official Journal of the European Union, OJ L, 12.7.2024, 2024
work page 2024
- [58]
-
[59]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29 0 (5): 0 1189--1232, 2001
work page 2001
-
[60]
A. Ghorbani and J. Zou. Data S hapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, 2019
work page 2019
-
[61]
A. Herren and P. R. Hahn. Statistical inference for feature rankings under model multiplicity. Journal of Computational and Graphical Statistics, 2023
work page 2023
- [62]
-
[63]
X. Huang and J. Marques-Silva. On the failings of S hapley values for explainability. International Journal of Approximate Reasoning, 171: 0 109112, 2024
work page 2024
-
[64]
G. Hunt and C. Stein. A structure theory for the class of exponential families. 1946. Unpublished manuscript; results presented in Lehmann & Romano (2005), Ch. 6
work page 1946
-
[65]
Y. Jin, R. Gao, and B. Kim. Probabilistic stability guarantees for feature attributions. In Advances in Neural Information Processing Systems, 2025
work page 2025
-
[66]
J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science (ITCS), 2017
work page 2017
-
[67]
S. Krishna, T. Han, A. Ber, J. Brinton, A. Lakshminarayanan, C. Robinson, and H. Lakkaraju. The disagreement problem in explainable machine learning: A practitioner's perspective. arXiv preprint arXiv:2202.01602, 2022
-
[68]
G. Laberge, Y. Pequignot, A. Mathieu, F. Khomh, and M. Marchand. Partial order in chaos: Consensus on feature attributions in the rashomon set. Journal of Machine Learning Research, 24 0 (364): 0 1--50, 2023
work page 2023
-
[69]
E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 3rd edition, 2005
work page 2005
-
[70]
S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30, 2017
work page 2017
-
[71]
T. Nipkow. Social choice theory in HOL : Arrow and G ibbard- S atterthwaite. Journal of Automated Reasoning, 43: 0 289--304, 2009
work page 2009
-
[72]
M. Noguer i Alonso. Mathematical foundations of explainability. SSRN Working Paper, 2025
work page 2025
-
[73]
Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12)
Office of the Comptroller of the Currency . Supervisory guidance on model risk management ( SR 11-7/ OCC 2011-12). OCC Bulletin 2011-12, 2011
work page 2011
- [74]
-
[75]
M. T. Ribeiro, S. Singh, and C. Guestrin. ``why should i trust you?'': Explaining the predictions of any classifier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016
work page 2016
-
[76]
C. Rudin et al. Position: Amazing things come from having many good models. In International Conference on Machine Learning (ICML), 2024
work page 2024
-
[77]
L. S. Shapley. A value for n-person games. In Contributions to the Theory of Games II, number 28 in Annals of Mathematics Studies, pages 307--317. Princeton University Press, 1953
work page 1953
-
[78]
S. Srinivas and F. Fleuret. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems, volume 32, 2019
work page 2019
-
[79]
The L ean mathematical library
The mathlib Community . The L ean mathematical library. In Certified Programs and Proofs (CPP), 2020
work page 2020
-
[80]
A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009
work page 2009
- [81]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.