Explainable AI needs formalization
Pith reviewed 2026-05-23 20:20 UTC · model grok-4.3
The pith
Popular XAI methods systematically attribute importance to input features independent of the prediction target.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Current XAI methods do not address well-defined problems and are not evaluated against targeted criteria of explanation correctness. As a result they systematically attribute importance to input features that are independent of the prediction target. This limits their utility for diagnosing and correcting data and models, for scientific discovery, and for identifying intervention targets. Researchers should formally define the problems they intend to solve and design methods accordingly, which will lead to diverse use-case-dependent notions of explanation correctness and objective metrics of explanation performance.
What carries the argument
The systematic attribution of importance to input features that are statistically independent of the prediction target, caused by the absence of well-defined problems and targeted correctness criteria.
If this is right
- XAI cannot reliably diagnose and correct data and models.
- XAI cannot support scientific discovery from machine learning models.
- XAI cannot identify intervention targets.
- Diverse use-case-dependent notions of explanation correctness will be needed.
- Objective metrics of explanation performance can be developed and used to validate algorithms.
Where Pith is reading between the lines
- Formal problem definitions could allow XAI methods to be compared directly against causal or statistical ground truth in controlled settings.
- Different application domains may converge on distinct formal correctness criteria rather than a single universal standard.
- Once formal criteria exist, existing attribution methods could be shown to be unsuitable for some questions and retained only for others.
- Objective performance metrics would enable systematic improvement of XAI algorithms similar to how loss functions drive model training.
Load-bearing premise
The observed attribution to independent features stems primarily from the absence of well-defined problems and targeted correctness criteria rather than from other implementation or data issues.
What would settle it
A demonstration that at least one popular XAI method avoids attributing importance to independent features once a specific problem is formally stated and a targeted correctness criterion is applied.
Figures
read the original abstract
The field of "explainable artificial intelligence" (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current state, XAI itself needs scrutiny. Popular methods cannot reliably answer relevant questions about ML models, their training data, or test inputs, because they systematically attribute importance to input features that are independent of the prediction target. This limits the utility of XAI for diagnosing and correcting data and models, for scientific discovery, and for identifying intervention targets. The fundamental reason for this is that current XAI methods do not address well-defined problems and are not evaluated against targeted criteria of explanation correctness. Researchers should formally define the problems they intend to solve and design methods accordingly. This will lead to diverse use-case-dependent notions of explanation correctness and objective metrics of explanation performance that can be used to validate XAI algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that current XAI methods cannot reliably answer questions about ML models, training data, or test inputs because they systematically attribute importance to input features independent of the prediction target. The root cause is identified as the absence of well-defined problems and targeted correctness criteria; the authors advocate that researchers should formally define intended problems, yielding use-case-dependent notions of explanation correctness and objective performance metrics.
Significance. If the central position holds, the paper could encourage the XAI community to shift from ad-hoc methods toward problem-specific formalizations, potentially increasing utility for model diagnosis, scientific discovery, and intervention targeting. As a conceptual position piece without new derivations, empirical results, or machine-checked proofs, its value is primarily in framing a methodological critique rather than delivering a technical advance.
major comments (3)
- [Abstract] Abstract: the assertion that popular methods 'systematically attribute importance to input features that are independent of the prediction target' is advanced as the central motivation yet is unsupported by any concrete example, counter-factual, reference to a specific method (e.g., SHAP, LIME), or empirical demonstration within the manuscript.
- [Full text] Position argument (throughout): the claim that the 'fundamental reason' for the observed failures is the lack of well-defined problems is not accompanied by discussion or exclusion of alternative explanations such as implementation artifacts or data properties, leaving the causal diagnosis untested.
- [Recommendations] Recommendations section: the prescription to 'formally define the problems they intend to solve' is not illustrated by even one worked example of a well-defined XAI problem together with a corresponding correctness criterion, reducing the actionability of the proposed remedy.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight opportunities to make the central claims more concrete and the recommendations more actionable. We respond to each major comment below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that popular methods 'systematically attribute importance to input features that are independent of the prediction target' is advanced as the central motivation yet is unsupported by any concrete example, counter-factual, reference to a specific method (e.g., SHAP, LIME), or empirical demonstration within the manuscript.
Authors: We agree that the abstract would benefit from a brief concrete illustration to ground the central claim. In revision we will add one short example (e.g., a LIME or SHAP attribution on a simple classifier where a feature receives non-zero importance despite having no causal effect on the predicted class) while preserving the position-paper character of the work. revision: yes
-
Referee: [Full text] Position argument (throughout): the claim that the 'fundamental reason' for the observed failures is the lack of well-defined problems is not accompanied by discussion or exclusion of alternative explanations such as implementation artifacts or data properties, leaving the causal diagnosis untested.
Authors: The manuscript treats the absence of well-defined problems as fundamental because, without explicit problem statements and correctness criteria, no method can be guaranteed to produce reliable answers regardless of implementation quality or data characteristics. Alternative factors such as bugs or data artifacts are acknowledged as possible contributors but are viewed as downstream of the missing formalization. To address the referee’s concern we will insert a short paragraph explicitly discussing why these alternatives are secondary rather than fundamental. revision: partial
-
Referee: [Recommendations] Recommendations section: the prescription to 'formally define the problems they intend to solve' is not illustrated by even one worked example of a well-defined XAI problem together with a corresponding correctness criterion, reducing the actionability of the proposed remedy.
Authors: We accept this criticism. The Recommendations section will be expanded with one concise worked example (e.g., formalizing the problem of “identifying features whose removal changes a model’s decision on a given input” together with a correctness criterion based on agreement with an oracle intervention). This addition will directly illustrate the advocated approach. revision: yes
Circularity Check
No significant circularity; position paper without derivations or self-referential claims
full rationale
The paper is a conceptual position argument advocating for formalization of XAI problems and correctness criteria. It advances no equations, derivations, fitted parameters, predictions, or formal statements that could reduce to inputs by construction. The observation that methods attribute importance to independent features is presented as motivation rather than a derived result. No self-citations function as load-bearing uniqueness theorems, and no ansatzes or renamings are smuggled in. The central claim remains independent of any internal reduction, making this a self-contained non-finding consistent with the default expectation for position pieces.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Popular XAI methods systematically attribute importance to input features independent of the prediction target
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Researchers should formally define the problems they intend to solve first and then design methods accordingly. This will lead to notions of explanation correctness that can be theoretically verified and objective metrics of explanation performance that can be assessed using ground-truth data.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanRecovery theorem (LogicNat ≃ Nat) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
The current XAI terminology uses the term 'explanation' indiscriminately... reflective of a deeper absence of well-defined problems for XAI to solve.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Counterfactual explanations for multivariate time series, in: 2021 International Conference on Applied Artificial Intelligence (ICAPAI), pp. 1–8. Babic, B., Gerke, S., Evgeniou, T., Cohen, I.G.,
work page 2021
-
[2]
Quality metrics for transparent machine learning with and without humans in the loop are not correlated, in: ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI. arXiv:2107.02033. Bilodeau, B., Jaques, N., Koh, P.W., Kim, B.,
-
[3]
Post-hoc explanations fail to achieve their purpose in adversarial contexts, in: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 891–905. Borgonovo, E., Ghidini, V ., Hahn, R., Plischke, E.,
work page 2022
-
[4]
Evaluating saliency methods on artificial data with different background types, in: Medical Imaging meets NeurIPS. An official NeurIPS Workshop. arXiv:2112.04882. Clark, B., Wilming, R., Haufe, S.,
-
[5]
arXiv preprint arXiv:2103.13701
Ecinn: efficient counterfactuals from invertible neural networks. arXiv preprint arXiv:2103.13701 . Ismail, A.A., Gunady, M., Pessoa, L., Corrada Bravo, H., Feizi, S.,
-
[6]
How can i choose an explainer? an application-grounded evaluation of post-hoc explanations, in: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 805–815. Jiménez-Luna, J., Grisoni, F., Schneider, G.,
work page 2021
-
[7]
Algorithmic recourse: from counterfactual explanations to interventions, in: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353–362. Karimi, A.H., V on Kügelgen, J., Schölkopf, B., Valera, I.,
work page 2021
-
[8]
IEEE Transactions on Neural Networks and Learning Systems 35, 1926–1940
From clustering to cluster explanations via neural networks. IEEE Transactions on Neural Networks and Learning Systems 35, 1926–1940. 9 Kindermans, P.J., Schütt, K.T., Alber, M., Müller, K.R., Erhan, D., Kim, B., Dähne, S.,
work page 1926
-
[9]
A consistent and efficient evaluation strategy for attribution methods. arXiv:2202.00449. Rudin, C.,
-
[10]
arXiv preprint arXiv:2404.12488
Global counterfactual directions. arXiv preprint arXiv:2404.12488 . 10 Sokol, K., Flach, P.,
-
[11]
Machine Learning, Special Issue of the ECML PKDD 2022 Journal Track , 1–21
Scrutinizing XAI using linear ground-truth data with suppressor variables. Machine Learning, Special Issue of the ECML PKDD 2022 Journal Track , 1–21. Wilming, R., Dox, A., Schulz, H., Oliveira, M., Clark, B., Haufe, S.,
work page 2022
-
[12]
GECOBench: A gender- controlled text dataset and benchmark for quantifying biases in explanations. arXiv:2406.11547. Wilming, R., Kieslich, L., Clark, B., Haufe, S.,
-
[13]
arXiv preprint arXiv:2404.18702
Why you should not trust interpretations in machine learning: Adversarial attacks on partial dependence plots. arXiv preprint arXiv:2404.18702 . Yalcin, O., Fan, X., Liu, S.,
-
[14]
arXiv preprint arXiv:2105.09740
Evaluating the correctness of explainable ai algorithms for classification. arXiv preprint arXiv:2105.09740 . Zhou, Y ., Booth, S., Ribeiro, M.T., Shah, J.,
-
[15]
Boundaries of Bayes-optimal decisions are shown as well
for two different correlations c and constant variances s2 1 = 0.8 and s2 2 = 0.5. Boundaries of Bayes-optimal decisions are shown as well. The marginal sample distributions illustrate that feature X2 does not carry any class-related information. c) Causal structure of the data in Examples A (left) and B (right). X2 is a so-called suppressor variable that...
work page 2023
-
[16]
LRP/DTD (Bach et al., 2015; Montavon et al.,
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.