pith. sign in

arxiv: 2605.24727 · v2 · pith:TMMS7PDFnew · submitted 2026-05-23 · 💻 cs.AI · cs.CL· cs.CY· cs.IT· math.IT

Fundamental Limitation in Explaining AI

Pith reviewed 2026-06-30 13:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CYcs.ITmath.IT
keywords AI explainabilityquadrilemmaexplanation faithfulnessinterpretabilityAI governancelarge language modelsexplanation limits
0
0 comments X

The pith

AI and its explanations cannot simultaneously satisfy complex environments, strong performance, interpretability, and complete faithfulness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a mathematical proof that no AI can meet all four conditions at once when the operating environment is complex, the AI performs well, its explanation is interpretable, and that explanation is completely faithful to the model's behavior. A sympathetic reader would care because this shows why demands for fully transparent and accurate explanations of large models like LLMs cannot be met in realistic settings without dropping one of the other requirements. The work concludes that explanations must therefore target only application-relevant parts rather than the full behavior. Governance approaches to AI should proceed from the premise that any explanation will always leave some faithfulness gaps.

Core claim

The paper proves a quadrilemma: an AI system and its explanation cannot simultaneously satisfy the complexity of the operation environment, the goodness of the AI's performance, the interpretability of the AI's explanation, and the complete faithfulness of the AI's explanation. This incompatibility holds under standard mathematical reasoning once the four conditions are formalized.

What carries the argument

The quadrilemma, which demonstrates the logical incompatibility of the four conditions through formalization and proof.

If this is right

  • In applications where environment complexity and performance cannot be reduced, complete faithfulness must be abandoned.
  • Explanations should focus only on the parts important for the specific application rather than attempting full coverage.
  • AI governance frameworks must be designed around the fact that explanations will always be incomplete in faithfulness.
  • Sacrificing one of the four conditions is required whenever the other three are prioritized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Partial explanation techniques become the practical ceiling rather than a temporary workaround.
  • Regulatory standards for AI transparency may need to shift away from requiring complete fidelity.
  • Similar trade-offs could appear in explanations of other complex decision systems such as economic models or biological networks.
  • Attempts to build counterexample systems could test whether relaxing the formalization of any one condition allows all four to hold.

Load-bearing premise

The four conditions can be defined mathematically so their joint satisfaction produces a contradiction without extra assumptions about how to measure faithfulness or interpretability.

What would settle it

An explicit construction of an AI operating in a complex environment, achieving good performance, paired with an interpretable explanation shown to be completely faithful to every aspect of its decisions.

Figures

Figures reproduced from arXiv: 2605.24727 by Atsushi Suzuki, Jing Wang.

Figure 1
Figure 1. Figure 1: Conceptual diagram illustrating the main claim of this study. In this paper, we prove that the following four conditions cannot all be satisfied simultaneously: the complexity of the operation environment, the goodness of AI performance, the interpretability of AI explanations, and the com￾plete faithfulness of AI explanations. Among these, it is difficult to give a mathematical formulation of the interpre… view at source ↗
read the original abstract

While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to prove a 'fundamental quadrilemma' asserting that no AI system can simultaneously satisfy (1) complex operating environments, (2) high performance, (3) interpretable explanations, and (4) completely faithful explanations. It concludes that explanations must therefore be incomplete and that AI governance should be designed around this premise.

Significance. A rigorous demonstration of such an incompatibility would be significant for explainable AI, supplying a theoretical basis for why post-hoc explanations of large models are necessarily approximate and directly informing regulatory expectations around transparency.

major comments (1)
  1. [Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.

    Authors: The abstract is a concise summary of the main result. The formal definitions of the four conditions (complexity of the operating environment, performance of the AI, interpretability of the explanation, and complete faithfulness of the explanation), the supporting lemmas, and the derivation establishing their incompatibility are provided in Sections 3 and 4 of the manuscript. Readers can therefore verify whether the quadrilemma follows from the stated formalization. revision: no

Circularity Check

0 steps flagged

No circularity: proof claim rests on external formalization of conditions, not self-definition or fitted inputs.

full rationale

The visible abstract and description frame the quadrilemma as a mathematical proof that four independently formalized conditions (environment complexity, performance quality, explanation interpretability, complete faithfulness) are mutually incompatible. No equations, definitions, or derivation steps are supplied in the provided text that would allow inspection for self-definitional reduction, fitted-input renaming, or load-bearing self-citation. The reader's assessment correctly notes the absence of visible circular structure; the skeptic concern about possible tautological definitions cannot be verified without the actual formalization, so no circular step can be quoted or exhibited per the hard rules. The derivation is treated as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects the high-level claim of a mathematical proof without access to specific formalizations or assumptions used in the derivation.

axioms (1)
  • standard math Standard mathematical logic suffices to formalize and prove incompatibility among the four stated conditions
    The proof is asserted to rest on definitions of complexity, performance, interpretability, and faithfulness.

pith-pipeline@v0.9.1-grok · 5740 in / 1149 out tokens · 26131 ms · 2026-06-30T13:08:28.941747+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 7 canonical work pages

  1. [1]

    Language models are unsupervised multitask learners,

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019

  2. [2]

    High-resolution image syn- thesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image syn- thesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 10684–10695, 2022

  3. [3]

    Artificial intelligence risk management frame- work: Generative artificial intelligence profile,

    National Institute of Standards and Technology, “Artificial intelligence risk management frame- work: Generative artificial intelligence profile,” Tech. Rep. NIST AI 600-1, National Institute of Standards and Technology, 2024

  4. [4]

    Guidance for risk management of artificial intelligence systems,

    European Data Protection Supervisor, “Guidance for risk management of artificial intelligence systems,” tech. rep., European Data Protection Supervisor, 2025

  5. [5]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Ad- vances in neural information processing systems , vol. 30, 2017

  6. [6]

    From local explanations to global understanding with explainable ai for trees,

    S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020

  7. [7]

    Axiomatic attribution for deep networks,

    M. Sundararajan, A. Taly, and Q. Y an, “Axiomatic attribution for deep networks,” in Interna- tional conference on machine learning , pp. 3319–3328, PMLR, 2017

  8. [8]

    Explaining explanations: Axiomatic feature interac- tions for deep networks,

    J. D. Janizek, P . Sturmfels, and S.-I. Lee, “Explaining explanations: Axiomatic feature interac- tions for deep networks,” Journal of Machine Learning Research , vol. 22, no. 104, pp. 1–54, 2021

  9. [9]

    Normlime: A new fea- ture importance metric for explaining deep neural networks,

    I. Ahern, A. Noack, L. Guzman-Nateras, D. Dou, B. Li, and J. Huan, “Normlime: A new fea- ture importance metric for explaining deep neural networks,”arXiv preprint arXiv:1909.04200, 2019

  10. [10]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , pp. 618–626, 2017

  11. [11]

    Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,

    A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,” in2018 IEEE winter conference on applications of computer vision (WACV) , pp. 839–847, IEEE, 2018

  12. [12]

    Score- cam: Score-weighted visual explanations for convolutional neural networks,

    H. Wang, Z. Wang, M. Du, F. Y ang, Z. Zhang, S. Ding, P . Mardziel, and X. Hu, “Score- cam: Score-weighted visual explanations for convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–25, 2020

  13. [13]

    arXiv:2008.02312 [cs.CV] https://arxiv.org/abs/2008.02312

    R. Fu, Q. Hu, X. Dong, Y . Guo, Y . Gao, and B. Li, “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns,” arXiv preprint arXiv:2008.02312, 2020

  14. [14]

    Counterfactual explanations without opening the black box: Automated decisions and the gdpr,

    S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech., vol. 31, p. 841, 2017

  15. [15]

    Algorithmic recourse: from counterfactual explana- tions to interventions,

    A.-H. Karimi, B. Schölkopf, and I. V alera, “Algorithmic recourse: from counterfactual explana- tions to interventions,” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353–362, 2021

  16. [16]

    Shiyuan, M

    S. Huang, S. Mamidanna, S. Jangam, Y . Zhou, and L. H. Gilpin, “Can large language models explain themselves? a study of llm-generated self-explanations,” arXiv preprint arXiv:2310.11207, 2023

  17. [17]

    arXiv preprint arXiv:2310.05797 (2023)

    N. Kroeger, D. Ley, S. Krishna, C. Agarwal, and H. Lakkaraju, “In-context explainers: Har- nessing llms for explaining black box models,” arXiv preprint arXiv:2310.05797, 2023

  18. [18]

    How interpretable are reasoning explanations from prompting large language models?,

    Y . W. Jie, R. Satapathy, R. Goh, and E. Cambria, “How interpretable are reasoning explanations from prompting large language models?,” in Findings of the Association for Computational Linguistics: NAACL 2024 , pp. 2148–2164, 2024

  19. [19]

    " why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic- tions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining , pp. 1135–1144, 2016. 11

  20. [20]

    Anchors: High-precision model-agnostic explana- tions,

    M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explana- tions,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018

  21. [21]

    Glocalx-from local to global explanations of black box ai models,

    M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “Glocalx-from local to global explanations of black box ai models,” Artificial Intelligence, vol. 294, p. 103457, 2021

  22. [22]

    Gleams: Bridging the gap between local and global explanations,

    G. Visani, V . Stanzione, and D. Garreau, “Gleams: Bridging the gap between local and global explanations,” arXiv preprint arXiv:2408.05060, 2024

  23. [23]

    Extracting tree-structured representations of trained networks,

    M. Craven and J. Shavlik, “Extracting tree-structured representations of trained networks,” Advances in neural information processing systems , vol. 8, 1995

  24. [24]

    Understanding neural networks via rule extraction,

    R. Setiono and H. Liu, “Understanding neural networks via rule extraction,” in IJCAI, vol. 1, pp. 480–485, 1995

  25. [25]

    Ai/ml for network security: The emperor has no clothes,

    A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security , pp. 1537–1551, 2022

  26. [26]

    Expected grad-cam: Towards gradient faithfulness,

    V . Buono, P . S. Mashhadi, M. Rahat, P . Tiwari, and S. Byttner, “Expected grad-cam: Towards gradient faithfulness,” arXiv preprint arXiv:2406.01274, 2024

  27. [27]

    Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,

    M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,” Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 74952–74965, 2023

  28. [28]

    Explain- ability for large language models: A survey,

    H. Zhao, H. Chen, F. Y ang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du, “Explain- ability for large language models: A survey,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 2, pp. 1–38, 2024

  29. [29]

    Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,

    A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al., “Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,” Information fusion , vol. 58, pp. 82–115, 2020

  30. [30]

    Explainable generative ai: A two-stage review of existing techniques and future research directions,

    P . M. Kumarage and M. Saarela, “Explainable generative ai: A two-stage review of existing techniques and future research directions,” AI, vol. 7, no. 1, p. 31, 2026

  31. [31]

    How many words do we read per minute? a review and meta-analysis of reading rate,

    M. Brysbaert, “How many words do we read per minute? a review and meta-analysis of reading rate,” Journal of memory and language , vol. 109, p. 104047, 2019

  32. [32]

    Most decimal places of pi memorized,

    Guinness World Records, “Most decimal places of pi memorized,” 2015. Record achieved by Rajveer Meena at VIT University, V ellore, India, on 21 March 2015

  33. [33]

    Trade-off between efficiency and consistency for removal-based explanations,

    Y . Zhang, H. He, Z. Tan, and Y . Y uan, “Trade-off between efficiency and consistency for removal-based explanations,” Advances in Neural Information Processing Systems , vol. 36, pp. 25627–25661, 2023

  34. [34]

    Impossibility theorems for feature attribution,

    B. Bilodeau, N. Jaques, P . W. Koh, and B. Kim, “Impossibility theorems for feature attribution,” Proceedings of the National Academy of Sciences , vol. 121, no. 2, p. e2304406120, 2024

  35. [35]

    A the- ory of interpretable approximations,

    M. Bressan, N. Cesa-Bianchi, E. Esposito, Y . Mansour, S. Moran, and M. Thiessen, “A the- ory of interpretable approximations,” in The Thirty Seventh Annual Conference on Learning Theory, pp. 648–668, PMLR, 2024

  36. [36]

    Partially interpretable models with guarantees on coverage and accuracy,

    N. Frost, Z. Lipton, Y . Mansour, and M. Moshkovitz, “Partially interpretable models with guarantees on coverage and accuracy,” in International conference on algorithmic learning theory, pp. 590–613, PMLR, 2024

  37. [37]

    Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,

    A. Shportko, “Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,” arXiv preprint arXiv:2603.21567, 2026

  38. [38]

    Conversational complexity for assessing risk in large language models,

    J. Burden, M. Cebrian, and J. Hernandez-Orallo, “Conversational complexity for assessing risk in large language models,” EPJ Data Science, vol. 14, no. 78, 2025

  39. [39]

    Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,

    Z. Pan, S. Wang, P . Liao, and J. Li, “Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,” in Advances in Neural Information Processing Systems, vol. 38, 2025. Spotlight

  40. [40]

    Language modeling is compression,

    G. Delétang, A. Ruoss, P .-A. Duquenne, E. Catt, T. Genewein, C. Mattern, J. Grau-Moya, L. K. Wenliang, M. Aitchison, L. Orseau, M. Hutter, and J. V eness, “Language modeling is compression,” in The Twelfth International Conference on Learning Representations , 2024. 12

  41. [41]

    In-context learning and occam’s razor,

    E. Elmoznino, T. Marty, T. Kasetty, L. Gagnon, S. Mittal, M. Fathi, D. Sridhar, and G. Lajoie, “In-context learning and occam’s razor,” in Proceedings of the 42nd International Conference on Machine Learning (A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Ma- haraj, K. Wagstaff, and J. Zhu, eds.), vol. 267 of Proceedings of Machine Learn...

  42. [42]

    A neural probabilistic language model,

    Y . Bengio, R. Ducharme, P . Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003

  43. [43]

    Recurrent neural network based language model.,

    T. Mikolov, M. Karafiát, L. Burget, J. Cernock `y, and S. Khudanpur, “Recurrent neural network based language model.,” in Interspeech, vol. 2, pp. 1045–1048, Makuhari, 2010

  44. [44]

    A formal theory of inductive inference. part i,

    R. J. Solomonoff, “A formal theory of inductive inference. part i,” Information and Control , vol. 7, pp. 1–22, Mar. 1964

  45. [45]

    A formal theory of inductive inference. part ii,

    R. J. Solomonoff, “A formal theory of inductive inference. part ii,” Information and Control , vol. 7, pp. 224–254, June 1964

  46. [46]

    Three approaches to the quantitative definition of information,

    A. N. Kolmogorov, “Three approaches to the quantitative definition of information,” Problems of Information Transmission, vol. 1, no. 1, pp. 1–7, 1965

  47. [47]

    On the simplicity and speed of programs for computing infinite sets of natural numbers,

    G. J. Chaitin, “On the simplicity and speed of programs for computing infinite sets of natural numbers,” Journal of the ACM, vol. 16, pp. 407–422, July 1969

  48. [48]

    Attention is all you need,

    A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  49. [49]

    Cutland, Computability: An introduction to recursive function theory

    N. Cutland, Computability: An introduction to recursive function theory. Cambridge university press, 1980

  50. [50]

    M. Li, P . Vitányi,et al., An introduction to Kolmogorov complexity and its applications , vol. 3. Springer, 2008

  51. [51]

    Algorithmic information theory,

    P . D. Grünwald, P . M. Vitányi,et al., “Algorithmic information theory,” Handbook of the Phi- losophy of Information, pp. 281–320, 2008. 13 A The number of parameters in generative AI exceeds human processing capacity This section provides specific discussions on the human capacity to recognize a series of letters and discusses why it matters. For example...

  52. [52]

    Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′

    By the injectivity of •, it follows that x0 = x′ 0. Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′

  53. [53]

    For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction

    Thus, in the two-variable case, the pairing function is injective. For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction. The projec- tion functions are obtained by the following algorithm. Projection function from a pairing • Input: z ∈ Σ∗. • Step 1:...