Fundamental Limitation in Explaining AI

Atsushi Suzuki; Jing Wang

arxiv: 2605.24727 · v2 · pith:TMMS7PDFnew · submitted 2026-05-23 · 💻 cs.AI · cs.CL· cs.CY· cs.IT· math.IT

Fundamental Limitation in Explaining AI

Atsushi Suzuki , Jing Wang This is my paper

Pith reviewed 2026-06-30 13:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CYcs.ITmath.IT

keywords AI explainabilityquadrilemmaexplanation faithfulnessinterpretabilityAI governancelarge language modelsexplanation limits

0 comments

The pith

AI and its explanations cannot simultaneously satisfy complex environments, strong performance, interpretability, and complete faithfulness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a mathematical proof that no AI can meet all four conditions at once when the operating environment is complex, the AI performs well, its explanation is interpretable, and that explanation is completely faithful to the model's behavior. A sympathetic reader would care because this shows why demands for fully transparent and accurate explanations of large models like LLMs cannot be met in realistic settings without dropping one of the other requirements. The work concludes that explanations must therefore target only application-relevant parts rather than the full behavior. Governance approaches to AI should proceed from the premise that any explanation will always leave some faithfulness gaps.

Core claim

The paper proves a quadrilemma: an AI system and its explanation cannot simultaneously satisfy the complexity of the operation environment, the goodness of the AI's performance, the interpretability of the AI's explanation, and the complete faithfulness of the AI's explanation. This incompatibility holds under standard mathematical reasoning once the four conditions are formalized.

What carries the argument

The quadrilemma, which demonstrates the logical incompatibility of the four conditions through formalization and proof.

If this is right

In applications where environment complexity and performance cannot be reduced, complete faithfulness must be abandoned.
Explanations should focus only on the parts important for the specific application rather than attempting full coverage.
AI governance frameworks must be designed around the fact that explanations will always be incomplete in faithfulness.
Sacrificing one of the four conditions is required whenever the other three are prioritized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Partial explanation techniques become the practical ceiling rather than a temporary workaround.
Regulatory standards for AI transparency may need to shift away from requiring complete fidelity.
Similar trade-offs could appear in explanations of other complex decision systems such as economic models or biological networks.
Attempts to build counterexample systems could test whether relaxing the formalization of any one condition allows all four to hold.

Load-bearing premise

The four conditions can be defined mathematically so their joint satisfaction produces a contradiction without extra assumptions about how to measure faithfulness or interpretability.

What would settle it

An explicit construction of an AI operating in a complex environment, achieving good performance, paired with an interpretable explanation shown to be completely faithful to every aspect of its decisions.

Figures

Figures reproduced from arXiv: 2605.24727 by Atsushi Suzuki, Jing Wang.

**Figure 1.** Figure 1: Conceptual diagram illustrating the main claim of this study. In this paper, we prove that the following four conditions cannot all be satisfied simultaneously: the complexity of the operation environment, the goodness of AI performance, the interpretability of AI explanations, and the complete faithfulness of AI explanations. Among these, it is difficult to give a mathematical formulation of the interpre… view at source ↗

read the original abstract

While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The quadrilemma claim needs the actual definitions and proof steps to show it's not just built into the formalization.

read the letter

Colleague,

The paper's main point is that an AI cannot meet all four conditions at once: complex operating environment, strong performance, interpretable explanation, and completely faithful explanation. The authors frame this as a quadrilemma and draw the practical conclusion that explanations will always be incomplete in real applications, so governance should plan around partial faithfulness.

The new element is pulling these four conditions together into one result. The paper does a reasonable job spelling out the governance implication that we should focus explanations on the parts that matter for a given use case rather than chasing full fidelity.

The soft spot is the complete absence of definitions or proof outline in the abstract. Without seeing how they formalize faithfulness (exact reproduction of behavior) or interpretability (simple human-readable form), it is hard to tell whether the incompatibility is derived or follows by construction once environment complexity is high. The stress-test concern about unstated assumptions looks like it could apply.

This is aimed at XAI theorists and people working on AI regulation. A reader already tracking impossibility results in explainability might find the synthesis useful once the math is visible.

I would send it for peer review. The topic is relevant enough that referees should check whether the formalization holds or collapses into a tautology.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to prove a 'fundamental quadrilemma' asserting that no AI system can simultaneously satisfy (1) complex operating environments, (2) high performance, (3) interpretable explanations, and (4) completely faithful explanations. It concludes that explanations must therefore be incomplete and that AI governance should be designed around this premise.

Significance. A rigorous demonstration of such an incompatibility would be significant for explainable AI, supplying a theoretical basis for why post-hoc explanations of large models are necessarily approximate and directly informing regulatory expectations around transparency.

major comments (1)

[Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 'mathematical proof' of the quadrilemma is unsupported by any definitions of the four conditions, lemmas, or derivation steps. Without these, it is impossible to determine whether the asserted mutual incompatibility follows from standard reasoning or is an artifact of how faithfulness and interpretability are formalized.

Authors: The abstract is a concise summary of the main result. The formal definitions of the four conditions (complexity of the operating environment, performance of the AI, interpretability of the explanation, and complete faithfulness of the explanation), the supporting lemmas, and the derivation establishing their incompatibility are provided in Sections 3 and 4 of the manuscript. Readers can therefore verify whether the quadrilemma follows from the stated formalization. revision: no

Circularity Check

0 steps flagged

No circularity: proof claim rests on external formalization of conditions, not self-definition or fitted inputs.

full rationale

The visible abstract and description frame the quadrilemma as a mathematical proof that four independently formalized conditions (environment complexity, performance quality, explanation interpretability, complete faithfulness) are mutually incompatible. No equations, definitions, or derivation steps are supplied in the provided text that would allow inspection for self-definitional reduction, fitted-input renaming, or load-bearing self-citation. The reader's assessment correctly notes the absence of visible circular structure; the skeptic concern about possible tautological definitions cannot be verified without the actual formalization, so no circular step can be quoted or exhibited per the hard rules. The derivation is treated as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects the high-level claim of a mathematical proof without access to specific formalizations or assumptions used in the derivation.

axioms (1)

standard math Standard mathematical logic suffices to formalize and prove incompatibility among the four stated conditions
The proof is asserted to rest on definitions of complexity, performance, interpretability, and faithfulness.

pith-pipeline@v0.9.1-grok · 5740 in / 1149 out tokens · 26131 ms · 2026-06-30T13:08:28.941747+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 7 canonical work pages

[1]

Language models are unsupervised multitask learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019

2019
[2]

High-resolution image syn- thesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image syn- thesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 10684–10695, 2022

2022
[3]

Artiﬁcial intelligence risk management frame- work: Generative artiﬁcial intelligence proﬁle,

National Institute of Standards and Technology, “Artiﬁcial intelligence risk management frame- work: Generative artiﬁcial intelligence proﬁle,” Tech. Rep. NIST AI 600-1, National Institute of Standards and Technology, 2024

2024
[4]

Guidance for risk management of artiﬁcial intelligence systems,

European Data Protection Supervisor, “Guidance for risk management of artiﬁcial intelligence systems,” tech. rep., European Data Protection Supervisor, 2025

2025
[5]

A uniﬁed approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A uniﬁed approach to interpreting model predictions,” Ad- vances in neural information processing systems , vol. 30, 2017

2017
[6]

From local explanations to global understanding with explainable ai for trees,

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020

2020
[7]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Y an, “Axiomatic attribution for deep networks,” in Interna- tional conference on machine learning , pp. 3319–3328, PMLR, 2017

2017
[8]

Explaining explanations: Axiomatic feature interac- tions for deep networks,

J. D. Janizek, P . Sturmfels, and S.-I. Lee, “Explaining explanations: Axiomatic feature interac- tions for deep networks,” Journal of Machine Learning Research , vol. 22, no. 104, pp. 1–54, 2021

2021
[9]

Normlime: A new fea- ture importance metric for explaining deep neural networks,

I. Ahern, A. Noack, L. Guzman-Nateras, D. Dou, B. Li, and J. Huan, “Normlime: A new fea- ture importance metric for explaining deep neural networks,”arXiv preprint arXiv:1909.04200, 2019

work page arXiv 1909
[10]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , pp. 618–626, 2017

2017
[11]

Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,

A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,” in2018 IEEE winter conference on applications of computer vision (WACV) , pp. 839–847, IEEE, 2018

2018
[12]

Score- cam: Score-weighted visual explanations for convolutional neural networks,

H. Wang, Z. Wang, M. Du, F. Y ang, Z. Zhang, S. Ding, P . Mardziel, and X. Hu, “Score- cam: Score-weighted visual explanations for convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–25, 2020

2020
[13]

arXiv:2008.02312 [cs.CV] https://arxiv.org/abs/2008.02312

R. Fu, Q. Hu, X. Dong, Y . Guo, Y . Gao, and B. Li, “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns,” arXiv preprint arXiv:2008.02312, 2020

work page arXiv 2008
[14]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr,

S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech., vol. 31, p. 841, 2017

2017
[15]

Algorithmic recourse: from counterfactual explana- tions to interventions,

A.-H. Karimi, B. Schölkopf, and I. V alera, “Algorithmic recourse: from counterfactual explana- tions to interventions,” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353–362, 2021

2021
[16]

Shiyuan, M

S. Huang, S. Mamidanna, S. Jangam, Y . Zhou, and L. H. Gilpin, “Can large language models explain themselves? a study of llm-generated self-explanations,” arXiv preprint arXiv:2310.11207, 2023

work page arXiv 2023
[17]

arXiv preprint arXiv:2310.05797 (2023)

N. Kroeger, D. Ley, S. Krishna, C. Agarwal, and H. Lakkaraju, “In-context explainers: Har- nessing llms for explaining black box models,” arXiv preprint arXiv:2310.05797, 2023

work page arXiv 2023
[18]

How interpretable are reasoning explanations from prompting large language models?,

Y . W. Jie, R. Satapathy, R. Goh, and E. Cambria, “How interpretable are reasoning explanations from prompting large language models?,” in Findings of the Association for Computational Linguistics: NAACL 2024 , pp. 2148–2164, 2024

2024
[19]

" why should i trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic- tions of any classiﬁer,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining , pp. 1135–1144, 2016. 11

2016
[20]

Anchors: High-precision model-agnostic explana- tions,

M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explana- tions,” in Proceedings of the AAAI conference on artiﬁcial intelligence , vol. 32, 2018

2018
[21]

Glocalx-from local to global explanations of black box ai models,

M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “Glocalx-from local to global explanations of black box ai models,” Artiﬁcial Intelligence, vol. 294, p. 103457, 2021

2021
[22]

Gleams: Bridging the gap between local and global explanations,

G. Visani, V . Stanzione, and D. Garreau, “Gleams: Bridging the gap between local and global explanations,” arXiv preprint arXiv:2408.05060, 2024

work page arXiv 2024
[23]

Extracting tree-structured representations of trained networks,

M. Craven and J. Shavlik, “Extracting tree-structured representations of trained networks,” Advances in neural information processing systems , vol. 8, 1995

1995
[24]

Understanding neural networks via rule extraction,

R. Setiono and H. Liu, “Understanding neural networks via rule extraction,” in IJCAI, vol. 1, pp. 480–485, 1995

1995
[25]

Ai/ml for network security: The emperor has no clothes,

A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security , pp. 1537–1551, 2022

2022
[26]

Expected grad-cam: Towards gradient faithfulness,

V . Buono, P . S. Mashhadi, M. Rahat, P . Tiwari, and S. Byttner, “Expected grad-cam: Towards gradient faithfulness,” arXiv preprint arXiv:2406.01274, 2024

work page arXiv 2024
[27]

Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,

M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,” Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 74952–74965, 2023

2023
[28]

Explain- ability for large language models: A survey,

H. Zhao, H. Chen, F. Y ang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du, “Explain- ability for large language models: A survey,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 2, pp. 1–38, 2024

2024
[29]

Explainable artiﬁcial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,

A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al., “Explainable artiﬁcial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,” Information fusion , vol. 58, pp. 82–115, 2020

2020
[30]

Explainable generative ai: A two-stage review of existing techniques and future research directions,

P . M. Kumarage and M. Saarela, “Explainable generative ai: A two-stage review of existing techniques and future research directions,” AI, vol. 7, no. 1, p. 31, 2026

2026
[31]

How many words do we read per minute? a review and meta-analysis of reading rate,

M. Brysbaert, “How many words do we read per minute? a review and meta-analysis of reading rate,” Journal of memory and language , vol. 109, p. 104047, 2019

2019
[32]

Most decimal places of pi memorized,

Guinness World Records, “Most decimal places of pi memorized,” 2015. Record achieved by Rajveer Meena at VIT University, V ellore, India, on 21 March 2015

2015
[33]

Trade-off between efﬁciency and consistency for removal-based explanations,

Y . Zhang, H. He, Z. Tan, and Y . Y uan, “Trade-off between efﬁciency and consistency for removal-based explanations,” Advances in Neural Information Processing Systems , vol. 36, pp. 25627–25661, 2023

2023
[34]

Impossibility theorems for feature attribution,

B. Bilodeau, N. Jaques, P . W. Koh, and B. Kim, “Impossibility theorems for feature attribution,” Proceedings of the National Academy of Sciences , vol. 121, no. 2, p. e2304406120, 2024

2024
[35]

A the- ory of interpretable approximations,

M. Bressan, N. Cesa-Bianchi, E. Esposito, Y . Mansour, S. Moran, and M. Thiessen, “A the- ory of interpretable approximations,” in The Thirty Seventh Annual Conference on Learning Theory, pp. 648–668, PMLR, 2024

2024
[36]

Partially interpretable models with guarantees on coverage and accuracy,

N. Frost, Z. Lipton, Y . Mansour, and M. Moshkovitz, “Partially interpretable models with guarantees on coverage and accuracy,” in International conference on algorithmic learning theory, pp. 590–613, PMLR, 2024

2024
[37]

Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,

A. Shportko, “Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,” arXiv preprint arXiv:2603.21567, 2026

work page arXiv 2026
[38]

Conversational complexity for assessing risk in large language models,

J. Burden, M. Cebrian, and J. Hernandez-Orallo, “Conversational complexity for assessing risk in large language models,” EPJ Data Science, vol. 14, no. 78, 2025

2025
[39]

Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,

Z. Pan, S. Wang, P . Liao, and J. Li, “Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,” in Advances in Neural Information Processing Systems, vol. 38, 2025. Spotlight

2025
[40]

Language modeling is compression,

G. Delétang, A. Ruoss, P .-A. Duquenne, E. Catt, T. Genewein, C. Mattern, J. Grau-Moya, L. K. Wenliang, M. Aitchison, L. Orseau, M. Hutter, and J. V eness, “Language modeling is compression,” in The Twelfth International Conference on Learning Representations , 2024. 12

2024
[41]

In-context learning and occam’s razor,

E. Elmoznino, T. Marty, T. Kasetty, L. Gagnon, S. Mittal, M. Fathi, D. Sridhar, and G. Lajoie, “In-context learning and occam’s razor,” in Proceedings of the 42nd International Conference on Machine Learning (A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Ma- haraj, K. Wagstaff, and J. Zhu, eds.), vol. 267 of Proceedings of Machine Learn...

2025
[42]

A neural probabilistic language model,

Y . Bengio, R. Ducharme, P . Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003

2003
[43]

Recurrent neural network based language model.,

T. Mikolov, M. Karaﬁát, L. Burget, J. Cernock `y, and S. Khudanpur, “Recurrent neural network based language model.,” in Interspeech, vol. 2, pp. 1045–1048, Makuhari, 2010

2010
[44]

A formal theory of inductive inference. part i,

R. J. Solomonoff, “A formal theory of inductive inference. part i,” Information and Control , vol. 7, pp. 1–22, Mar. 1964

1964
[45]

A formal theory of inductive inference. part ii,

R. J. Solomonoff, “A formal theory of inductive inference. part ii,” Information and Control , vol. 7, pp. 224–254, June 1964

1964
[46]

Three approaches to the quantitative deﬁnition of information,

A. N. Kolmogorov, “Three approaches to the quantitative deﬁnition of information,” Problems of Information Transmission, vol. 1, no. 1, pp. 1–7, 1965

1965
[47]

On the simplicity and speed of programs for computing inﬁnite sets of natural numbers,

G. J. Chaitin, “On the simplicity and speed of programs for computing inﬁnite sets of natural numbers,” Journal of the ACM, vol. 16, pp. 407–422, July 1969

1969
[48]

Attention is all you need,

A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[49]

Cutland, Computability: An introduction to recursive function theory

N. Cutland, Computability: An introduction to recursive function theory. Cambridge university press, 1980

1980
[50]

M. Li, P . Vitányi,et al., An introduction to Kolmogorov complexity and its applications , vol. 3. Springer, 2008

2008
[51]

Algorithmic information theory,

P . D. Grünwald, P . M. Vitányi,et al., “Algorithmic information theory,” Handbook of the Phi- losophy of Information, pp. 281–320, 2008. 13 A The number of parameters in generative AI exceeds human processing capacity This section provides speciﬁc discussions on the human capacity to recognize a series of letters and discusses why it matters. For example...

2008
[52]

Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′

By the injectivity of •, it follows that x0 = x′ 0. Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′
[53]

For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction

Thus, in the two-variable case, the pairing function is injective. For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction. The projec- tion functions are obtained by the following algorithm. Projection function from a pairing • Input: z ∈ Σ∗. • Step 1:...

[1] [1]

Language models are unsupervised multitask learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019

2019

[2] [2]

High-resolution image syn- thesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image syn- thesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 10684–10695, 2022

2022

[3] [3]

Artiﬁcial intelligence risk management frame- work: Generative artiﬁcial intelligence proﬁle,

National Institute of Standards and Technology, “Artiﬁcial intelligence risk management frame- work: Generative artiﬁcial intelligence proﬁle,” Tech. Rep. NIST AI 600-1, National Institute of Standards and Technology, 2024

2024

[4] [4]

Guidance for risk management of artiﬁcial intelligence systems,

European Data Protection Supervisor, “Guidance for risk management of artiﬁcial intelligence systems,” tech. rep., European Data Protection Supervisor, 2025

2025

[5] [5]

A uniﬁed approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A uniﬁed approach to interpreting model predictions,” Ad- vances in neural information processing systems , vol. 30, 2017

2017

[6] [6]

From local explanations to global understanding with explainable ai for trees,

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020

2020

[7] [7]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Y an, “Axiomatic attribution for deep networks,” in Interna- tional conference on machine learning , pp. 3319–3328, PMLR, 2017

2017

[8] [8]

Explaining explanations: Axiomatic feature interac- tions for deep networks,

J. D. Janizek, P . Sturmfels, and S.-I. Lee, “Explaining explanations: Axiomatic feature interac- tions for deep networks,” Journal of Machine Learning Research , vol. 22, no. 104, pp. 1–54, 2021

2021

[9] [9]

Normlime: A new fea- ture importance metric for explaining deep neural networks,

I. Ahern, A. Noack, L. Guzman-Nateras, D. Dou, B. Li, and J. Huan, “Normlime: A new fea- ture importance metric for explaining deep neural networks,”arXiv preprint arXiv:1909.04200, 2019

work page arXiv 1909

[10] [10]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , pp. 618–626, 2017

2017

[11] [11]

Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,

A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks,” in2018 IEEE winter conference on applications of computer vision (WACV) , pp. 839–847, IEEE, 2018

2018

[12] [12]

Score- cam: Score-weighted visual explanations for convolutional neural networks,

H. Wang, Z. Wang, M. Du, F. Y ang, Z. Zhang, S. Ding, P . Mardziel, and X. Hu, “Score- cam: Score-weighted visual explanations for convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–25, 2020

2020

[13] [13]

arXiv:2008.02312 [cs.CV] https://arxiv.org/abs/2008.02312

R. Fu, Q. Hu, X. Dong, Y . Guo, Y . Gao, and B. Li, “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns,” arXiv preprint arXiv:2008.02312, 2020

work page arXiv 2008

[14] [14]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr,

S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech., vol. 31, p. 841, 2017

2017

[15] [15]

Algorithmic recourse: from counterfactual explana- tions to interventions,

A.-H. Karimi, B. Schölkopf, and I. V alera, “Algorithmic recourse: from counterfactual explana- tions to interventions,” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353–362, 2021

2021

[16] [16]

Shiyuan, M

S. Huang, S. Mamidanna, S. Jangam, Y . Zhou, and L. H. Gilpin, “Can large language models explain themselves? a study of llm-generated self-explanations,” arXiv preprint arXiv:2310.11207, 2023

work page arXiv 2023

[17] [17]

arXiv preprint arXiv:2310.05797 (2023)

N. Kroeger, D. Ley, S. Krishna, C. Agarwal, and H. Lakkaraju, “In-context explainers: Har- nessing llms for explaining black box models,” arXiv preprint arXiv:2310.05797, 2023

work page arXiv 2023

[18] [18]

How interpretable are reasoning explanations from prompting large language models?,

Y . W. Jie, R. Satapathy, R. Goh, and E. Cambria, “How interpretable are reasoning explanations from prompting large language models?,” in Findings of the Association for Computational Linguistics: NAACL 2024 , pp. 2148–2164, 2024

2024

[19] [19]

" why should i trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic- tions of any classiﬁer,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining , pp. 1135–1144, 2016. 11

2016

[20] [20]

Anchors: High-precision model-agnostic explana- tions,

M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explana- tions,” in Proceedings of the AAAI conference on artiﬁcial intelligence , vol. 32, 2018

2018

[21] [21]

Glocalx-from local to global explanations of black box ai models,

M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “Glocalx-from local to global explanations of black box ai models,” Artiﬁcial Intelligence, vol. 294, p. 103457, 2021

2021

[22] [22]

Gleams: Bridging the gap between local and global explanations,

G. Visani, V . Stanzione, and D. Garreau, “Gleams: Bridging the gap between local and global explanations,” arXiv preprint arXiv:2408.05060, 2024

work page arXiv 2024

[23] [23]

Extracting tree-structured representations of trained networks,

M. Craven and J. Shavlik, “Extracting tree-structured representations of trained networks,” Advances in neural information processing systems , vol. 8, 1995

1995

[24] [24]

Understanding neural networks via rule extraction,

R. Setiono and H. Liu, “Understanding neural networks via rule extraction,” in IJCAI, vol. 1, pp. 480–485, 1995

1995

[25] [25]

Ai/ml for network security: The emperor has no clothes,

A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security , pp. 1537–1551, 2022

2022

[26] [26]

Expected grad-cam: Towards gradient faithfulness,

V . Buono, P . S. Mashhadi, M. Rahat, P . Tiwari, and S. Byttner, “Expected grad-cam: Towards gradient faithfulness,” arXiv preprint arXiv:2406.01274, 2024

work page arXiv 2024

[27] [27]

Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,

M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting,” Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 74952–74965, 2023

2023

[28] [28]

Explain- ability for large language models: A survey,

H. Zhao, H. Chen, F. Y ang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du, “Explain- ability for large language models: A survey,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 2, pp. 1–38, 2024

2024

[29] [29]

Explainable artiﬁcial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,

A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al., “Explainable artiﬁcial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai,” Information fusion , vol. 58, pp. 82–115, 2020

2020

[30] [30]

Explainable generative ai: A two-stage review of existing techniques and future research directions,

P . M. Kumarage and M. Saarela, “Explainable generative ai: A two-stage review of existing techniques and future research directions,” AI, vol. 7, no. 1, p. 31, 2026

2026

[31] [31]

How many words do we read per minute? a review and meta-analysis of reading rate,

M. Brysbaert, “How many words do we read per minute? a review and meta-analysis of reading rate,” Journal of memory and language , vol. 109, p. 104047, 2019

2019

[32] [32]

Most decimal places of pi memorized,

Guinness World Records, “Most decimal places of pi memorized,” 2015. Record achieved by Rajveer Meena at VIT University, V ellore, India, on 21 March 2015

2015

[33] [33]

Trade-off between efﬁciency and consistency for removal-based explanations,

Y . Zhang, H. He, Z. Tan, and Y . Y uan, “Trade-off between efﬁciency and consistency for removal-based explanations,” Advances in Neural Information Processing Systems , vol. 36, pp. 25627–25661, 2023

2023

[34] [34]

Impossibility theorems for feature attribution,

B. Bilodeau, N. Jaques, P . W. Koh, and B. Kim, “Impossibility theorems for feature attribution,” Proceedings of the National Academy of Sciences , vol. 121, no. 2, p. e2304406120, 2024

2024

[35] [35]

A the- ory of interpretable approximations,

M. Bressan, N. Cesa-Bianchi, E. Esposito, Y . Mansour, S. Moran, and M. Thiessen, “A the- ory of interpretable approximations,” in The Thirty Seventh Annual Conference on Learning Theory, pp. 648–668, PMLR, 2024

2024

[36] [36]

Partially interpretable models with guarantees on coverage and accuracy,

N. Frost, Z. Lipton, Y . Mansour, and M. Moshkovitz, “Partially interpretable models with guarantees on coverage and accuracy,” in International conference on algorithmic learning theory, pp. 590–613, PMLR, 2024

2024

[37] [37]

Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,

A. Shportko, “Kolmogorov complexity bounds for llm steganography and a perplexity-based detection proxy,” arXiv preprint arXiv:2603.21567, 2026

work page arXiv 2026

[38] [38]

Conversational complexity for assessing risk in large language models,

J. Burden, M. Cebrian, and J. Hernandez-Orallo, “Conversational complexity for assessing risk in large language models,” EPJ Data Science, vol. 14, no. 78, 2025

2025

[39] [39]

Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,

Z. Pan, S. Wang, P . Liao, and J. Li, “Understanding llm behaviors via compression: Data gener- ation, knowledge acquisition and scaling laws,” in Advances in Neural Information Processing Systems, vol. 38, 2025. Spotlight

2025

[40] [40]

Language modeling is compression,

G. Delétang, A. Ruoss, P .-A. Duquenne, E. Catt, T. Genewein, C. Mattern, J. Grau-Moya, L. K. Wenliang, M. Aitchison, L. Orseau, M. Hutter, and J. V eness, “Language modeling is compression,” in The Twelfth International Conference on Learning Representations , 2024. 12

2024

[41] [41]

In-context learning and occam’s razor,

E. Elmoznino, T. Marty, T. Kasetty, L. Gagnon, S. Mittal, M. Fathi, D. Sridhar, and G. Lajoie, “In-context learning and occam’s razor,” in Proceedings of the 42nd International Conference on Machine Learning (A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Ma- haraj, K. Wagstaff, and J. Zhu, eds.), vol. 267 of Proceedings of Machine Learn...

2025

[42] [42]

A neural probabilistic language model,

Y . Bengio, R. Ducharme, P . Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003

2003

[43] [43]

Recurrent neural network based language model.,

T. Mikolov, M. Karaﬁát, L. Burget, J. Cernock `y, and S. Khudanpur, “Recurrent neural network based language model.,” in Interspeech, vol. 2, pp. 1045–1048, Makuhari, 2010

2010

[44] [44]

A formal theory of inductive inference. part i,

R. J. Solomonoff, “A formal theory of inductive inference. part i,” Information and Control , vol. 7, pp. 1–22, Mar. 1964

1964

[45] [45]

A formal theory of inductive inference. part ii,

R. J. Solomonoff, “A formal theory of inductive inference. part ii,” Information and Control , vol. 7, pp. 224–254, June 1964

1964

[46] [46]

Three approaches to the quantitative deﬁnition of information,

A. N. Kolmogorov, “Three approaches to the quantitative deﬁnition of information,” Problems of Information Transmission, vol. 1, no. 1, pp. 1–7, 1965

1965

[47] [47]

On the simplicity and speed of programs for computing inﬁnite sets of natural numbers,

G. J. Chaitin, “On the simplicity and speed of programs for computing inﬁnite sets of natural numbers,” Journal of the ACM, vol. 16, pp. 407–422, July 1969

1969

[48] [48]

Attention is all you need,

A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017

[49] [49]

Cutland, Computability: An introduction to recursive function theory

N. Cutland, Computability: An introduction to recursive function theory. Cambridge university press, 1980

1980

[50] [50]

M. Li, P . Vitányi,et al., An introduction to Kolmogorov complexity and its applications , vol. 3. Springer, 2008

2008

[51] [51]

Algorithmic information theory,

P . D. Grünwald, P . M. Vitányi,et al., “Algorithmic information theory,” Handbook of the Phi- losophy of Information, pp. 281–320, 2008. 13 A The number of parameters in generative AI exceeds human processing capacity This section provides speciﬁc discussions on the human capacity to recognize a series of letters and discusses why it matters. For example...

2008

[52] [52]

Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′

By the injectivity of •, it follows that x0 = x′ 0. Together with x0 · x1 = x′ 0 · x′ 1, this also implies x1 = x′

[53] [53]

For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction

Thus, in the two-variable case, the pairing function is injective. For a general n-variable pairing, since ⟨x0, x1, ..., xn−2, xn−1⟩ = ⟨x0, ⟨x1, ..., ⟨xn−2, xn−1⟩ · · · ⟩⟩, injectivity is easily shown by mathematical induction. The projec- tion functions are obtained by the following algorithm. Projection function from a pairing • Input: z ∈ Σ∗. • Step 1:...