Fundamental Limitation in Explaining AI
Pith reviewed 2026-06-30 13:08 UTC · model grok-4.3
The pith
AI and its explanations cannot simultaneously satisfy complex environments, strong performance, interpretability, and complete faithfulness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper proves a quadrilemma: an AI system and its explanation cannot simultaneously satisfy the complexity of the operation environment, the goodness of the AI's performance, the interpretability of the AI's explanation, and the complete faithfulness of the AI's explanation. This incompatibility holds under standard mathematical reasoning once the four conditions are formalized.
What carries the argument
The quadrilemma, which demonstrates the logical incompatibility of the four conditions through formalization and proof.
If this is right
- In applications where environment complexity and performance cannot be reduced, complete faithfulness must be abandoned.
- Explanations should focus only on the parts important for the specific application rather than attempting full coverage.
- AI governance frameworks must be designed around the fact that explanations will always be incomplete in faithfulness.
- Sacrificing one of the four conditions is required whenever the other three are prioritized.
Where Pith is reading between the lines
- Partial explanation techniques become the practical ceiling rather than a temporary workaround.
- Regulatory standards for AI transparency may need to shift away from requiring complete fidelity.
- Similar trade-offs could appear in explanations of other complex decision systems such as economic models or biological networks.
- Attempts to build counterexample systems could test whether relaxing the formalization of any one condition allows all four to hold.
Load-bearing premise
The four conditions can be defined mathematically so their joint satisfaction produces a contradiction without extra assumptions about how to measure faithfulness or interpretability.
What would settle it
An explicit construction of an AI operating in a complex environment, achieving good performance, paired with an interpretable explanation shown to be completely faithful to every aspect of its decisions.
Figures
read the original abstract
While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to prove a 'fundamental quadrilemma' asserting that no AI system can simultaneously satisfy (1) complex operating environments, (2) high performance, (3) interpretable explanations, and (4) completely faithful explanations. It concludes that explanations must therefore be incomplete and that AI governance should be designed around this premise.
Significance. A rigorous demonstration of such an incompatibility would be significant for explainable AI, supplying a theoretical basis for why post-hoc explanations of large models are necessarily approximate and directly informing regulatory expectations around transparency.
major comments (1)
- [Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.
Simulated Author's Rebuttal
We thank the referee for their review of our manuscript. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.
Authors: The abstract is a concise summary of the main result. The formal definitions of the four conditions (complexity of the operating environment, performance of the AI, interpretability of the explanation, and complete faithfulness of the explanation), the supporting lemmas, and the derivation establishing their incompatibility are provided in Sections 3 and 4 of the manuscript. Readers can therefore verify whether the quadrilemma follows from the stated formalization. revision: no
Circularity Check
No circularity: proof claim rests on external formalization of conditions, not self-definition or fitted inputs.
full rationale
The visible abstract and description frame the quadrilemma as a mathematical proof that four independently formalized conditions (environment complexity, performance quality, explanation interpretability, complete faithfulness) are mutually incompatible. No equations, definitions, or derivation steps are supplied in the provided text that would allow inspection for self-definitional reduction, fitted-input renaming, or load-bearing self-citation. The reader's assessment correctly notes the absence of visible circular structure; the skeptic concern about possible tautological definitions cannot be verified without the actual formalization, so no circular step can be quoted or exhibited per the hard rules. The derivation is treated as self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard mathematical logic suffices to formalize and prove incompatibility among the four stated conditions
Reference graph
Works this paper leans on
-
[1]
Language models are unsupervised multitask learners,
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019
2019
-
[2]
High-resolution image syn- thesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image syn- thesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 10684–10695, 2022
2022
-
[3]
Artificial intelligence risk management frame- work: Generative artificial intelligence profile,
National Institute of Standards and Technology, “Artificial intelligence risk management frame- work: Generative artificial intelligence profile,” Tech. Rep. NIST AI 600-1, National Institute of Standards and Technology, 2024
2024
-
[4]
Guidance for risk management of artificial intelligence systems,
European Data Protection Supervisor, “Guidance for risk management of artificial intelligence systems,” tech. rep., European Data Protection Supervisor, 2025
2025
-
[5]
A unified approach to interpreting model predictions,
S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Ad- vances in neural information processing systems , vol. 30, 2017
2017
-
[6]
From local explanations to global understanding with explainable ai for trees,
S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020
2020
-
[7]
Axiomatic attribution for deep networks,
M. Sundararajan, A. Taly, and Q. Y an, “Axiomatic attribution for deep networks,” in Interna- tional conference on machine learning , pp. 3319–3328, PMLR, 2017
2017
-
[8]
Explaining explanations: Axiomatic feature interac- tions for deep networks,
J. D. Janizek, P . Sturmfels, and S.-I. Lee, “Explaining explanations: Axiomatic feature interac- tions for deep networks,” Journal of Machine Learning Research , vol. 22, no. 104, pp. 1–54, 2021
2021
-
[9]
Normlime: A new fea- ture importance metric for explaining deep neural networks,
I. Ahern, A. Noack, L. Guzman-Nateras, D. Dou, B. Li, and J. Huan, “Normlime: A new fea- ture importance metric for explaining deep neural networks,”arXiv preprint arXiv:1909.04200, 2019
-
[10]
Grad-cam: Visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , pp. 618–626, 2017
2017
-
[11]
Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,
A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,” in2018 IEEE winter conference on applications of computer vision (WACV) , pp. 839–847, IEEE, 2018
2018
-
[12]
Score- cam: Score-weighted visual explanations for convolutional neural networks,
H. Wang, Z. Wang, M. Du, F. Y ang, Z. Zhang, S. Ding, P . Mardziel, and X. Hu, “Score- cam: Score-weighted visual explanations for convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–25, 2020
2020
-
[13]
arXiv:2008.02312 [cs.CV] https://arxiv.org/abs/2008.02312
R. Fu, Q. Hu, X. Dong, Y . Guo, Y . Gao, and B. Li, “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns,” arXiv preprint arXiv:2008.02312, 2020
-
[14]
Counterfactual explanations without opening the black box: Automated decisions and the gdpr,
S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech., vol. 31, p. 841, 2017
2017
-
[15]
Algorithmic recourse: from counterfactual explana- tions to interventions,
A.-H. Karimi, B. Schölkopf, and I. V alera, “Algorithmic recourse: from counterfactual explana- tions to interventions,” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353–362, 2021
2021
-
[16]
S. Huang, S. Mamidanna, S. Jangam, Y . Zhou, and L. H. Gilpin, “Can large language models explain themselves? a study of llm-generated self-explanations,” arXiv preprint arXiv:2310.11207, 2023
-
[17]
arXiv preprint arXiv:2310.05797 (2023)
N. Kroeger, D. Ley, S. Krishna, C. Agarwal, and H. Lakkaraju, “In-context explainers: Har- nessing llms for explaining black box models,” arXiv preprint arXiv:2310.05797, 2023
-
[18]
How interpretable are reasoning explanations from prompting large language models?,
Y . W. Jie, R. Satapathy, R. Goh, and E. Cambria, “How interpretable are reasoning explanations from prompting large language models?,” in Findings of the Association for Computational Linguistics: NAACL 2024 , pp. 2148–2164, 2024
2024
-
[19]
" why should i trust you?
M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic- tions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining , pp. 1135–1144, 2016. 11
2016
-
[20]
Anchors: High-precision model-agnostic explana- tions,
M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explana- tions,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018
2018
-
[21]
Glocalx-from local to global explanations of black box ai models,
M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “Glocalx-from local to global explanations of black box ai models,” Artificial Intelligence, vol. 294, p. 103457, 2021
2021
-
[22]
Gleams: Bridging the gap between local and global explanations,
G. Visani, V . Stanzione, and D. Garreau, “Gleams: Bridging the gap between local and global explanations,” arXiv preprint arXiv:2408.05060, 2024
-
[23]
Extracting tree-structured representations of trained networks,
M. Craven and J. Shavlik, “Extracting tree-structured representations of trained networks,” Advances in neural information processing systems , vol. 8, 1995
1995
-
[24]
Understanding neural networks via rule extraction,
R. Setiono and H. Liu, “Understanding neural networks via rule extraction,” in IJCAI, vol. 1, pp. 480–485, 1995
1995
-
[25]
Ai/ml for network security: The emperor has no clothes,
A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security , pp. 1537–1551, 2022
2022
-
[26]
Expected grad-cam: Towards gradient faithfulness,
V . Buono, P . S. Mashhadi, M. Rahat, P . Tiwari, and S. Byttner, “Expected grad-cam: Towards gradient faithfulness,” arXiv preprint arXiv:2406.01274, 2024
-
[27]
Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,
M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,” Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 74952–74965, 2023
2023
-
[28]
Explain- ability for large language models: A survey,
H. Zhao, H. Chen, F. Y ang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du, “Explain- ability for large language models: A survey,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 2, pp. 1–38, 2024
2024
-
[29]
Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,
A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al., “Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,” Information fusion , vol. 58, pp. 82–115, 2020
2020
-
[30]
Explainable generative ai: A two-stage review of existing techniques and future research directions,
P . M. Kumarage and M. Saarela, “Explainable generative ai: A two-stage review of existing techniques and future research directions,” AI, vol. 7, no. 1, p. 31, 2026
2026
-
[31]
How many words do we read per minute? a review and meta-analysis of reading rate,
M. Brysbaert, “How many words do we read per minute? a review and meta-analysis of reading rate,” Journal of memory and language , vol. 109, p. 104047, 2019
2019
-
[32]
Most decimal places of pi memorized,
Guinness World Records, “Most decimal places of pi memorized,” 2015. Record achieved by Rajveer Meena at VIT University, V ellore, India, on 21 March 2015
2015
-
[33]
Trade-off between efficiency and consistency for removal-based explanations,
Y . Zhang, H. He, Z. Tan, and Y . Y uan, “Trade-off between efficiency and consistency for removal-based explanations,” Advances in Neural Information Processing Systems , vol. 36, pp. 25627–25661, 2023
2023
-
[34]
Impossibility theorems for feature attribution,
B. Bilodeau, N. Jaques, P . W. Koh, and B. Kim, “Impossibility theorems for feature attribution,” Proceedings of the National Academy of Sciences , vol. 121, no. 2, p. e2304406120, 2024
2024
-
[35]
A the- ory of interpretable approximations,
M. Bressan, N. Cesa-Bianchi, E. Esposito, Y . Mansour, S. Moran, and M. Thiessen, “A the- ory of interpretable approximations,” in The Thirty Seventh Annual Conference on Learning Theory, pp. 648–668, PMLR, 2024
2024
-
[36]
Partially interpretable models with guarantees on coverage and accuracy,
N. Frost, Z. Lipton, Y . Mansour, and M. Moshkovitz, “Partially interpretable models with guarantees on coverage and accuracy,” in International conference on algorithmic learning theory, pp. 590–613, PMLR, 2024
2024
-
[37]
Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,
A. Shportko, “Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,” arXiv preprint arXiv:2603.21567, 2026
-
[38]
Conversational complexity for assessing risk in large language models,
J. Burden, M. Cebrian, and J. Hernandez-Orallo, “Conversational complexity for assessing risk in large language models,” EPJ Data Science, vol. 14, no. 78, 2025
2025
-
[39]
Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,
Z. Pan, S. Wang, P . Liao, and J. Li, “Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,” in Advances in Neural Information Processing Systems, vol. 38, 2025. Spotlight
2025
-
[40]
Language modeling is compression,
G. Delétang, A. Ruoss, P .-A. Duquenne, E. Catt, T. Genewein, C. Mattern, J. Grau-Moya, L. K. Wenliang, M. Aitchison, L. Orseau, M. Hutter, and J. V eness, “Language modeling is compression,” in The Twelfth International Conference on Learning Representations , 2024. 12
2024
-
[41]
In-context learning and occam’s razor,
E. Elmoznino, T. Marty, T. Kasetty, L. Gagnon, S. Mittal, M. Fathi, D. Sridhar, and G. Lajoie, “In-context learning and occam’s razor,” in Proceedings of the 42nd International Conference on Machine Learning (A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Ma- haraj, K. Wagstaff, and J. Zhu, eds.), vol. 267 of Proceedings of Machine Learn...
2025
-
[42]
A neural probabilistic language model,
Y . Bengio, R. Ducharme, P . Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003
2003
-
[43]
Recurrent neural network based language model.,
T. Mikolov, M. Karafiát, L. Burget, J. Cernock `y, and S. Khudanpur, “Recurrent neural network based language model.,” in Interspeech, vol. 2, pp. 1045–1048, Makuhari, 2010
2010
-
[44]
A formal theory of inductive inference. part i,
R. J. Solomonoff, “A formal theory of inductive inference. part i,” Information and Control , vol. 7, pp. 1–22, Mar. 1964
1964
-
[45]
A formal theory of inductive inference. part ii,
R. J. Solomonoff, “A formal theory of inductive inference. part ii,” Information and Control , vol. 7, pp. 224–254, June 1964
1964
-
[46]
Three approaches to the quantitative definition of information,
A. N. Kolmogorov, “Three approaches to the quantitative definition of information,” Problems of Information Transmission, vol. 1, no. 1, pp. 1–7, 1965
1965
-
[47]
On the simplicity and speed of programs for computing infinite sets of natural numbers,
G. J. Chaitin, “On the simplicity and speed of programs for computing infinite sets of natural numbers,” Journal of the ACM, vol. 16, pp. 407–422, July 1969
1969
-
[48]
Attention is all you need,
A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[49]
Cutland, Computability: An introduction to recursive function theory
N. Cutland, Computability: An introduction to recursive function theory. Cambridge university press, 1980
1980
-
[50]
M. Li, P . Vitányi,et al., An introduction to Kolmogorov complexity and its applications , vol. 3. Springer, 2008
2008
-
[51]
Algorithmic information theory,
P . D. Grünwald, P . M. Vitányi,et al., “Algorithmic information theory,” Handbook of the Phi- losophy of Information, pp. 281–320, 2008. 13 A The number of parameters in generative AI exceeds human processing capacity This section provides specific discussions on the human capacity to recognize a series of letters and discusses why it matters. For example...
2008
-
[52]
Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′
By the injectivity of •, it follows that x0 = x′ 0. Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′
-
[53]
For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction
Thus, in the two-variable case, the pairing function is injective. For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction. The projec- tion functions are obtained by the following algorithm. Projection function from a pairing • Input: z ∈ Σ∗. • Step 1:...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.