Unexplainability and Incomprehensibility of Artificial Intelligence

Roman V. Yampolskiy

arxiv: 1907.03869 · v1 · pith:N3PXFL5Qnew · submitted 2019-06-20 · 💻 cs.CY

Unexplainability and Incomprehensibility of Artificial Intelligence

Roman V. Yampolskiy This is my paper

Pith reviewed 2026-05-25 18:59 UTC · model grok-4.3

classification 💻 cs.CY

keywords explainabilityincomprehensibilityartificial intelligenceimpossibility resultsdecision makingAI safetytransparency

0 comments

The pith

Advanced AIs cannot accurately explain some of their decisions, and humans will not understand some of the explanations they can provide.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes two complementary impossibility results for advanced artificial intelligence. One result shows that an AI cannot always produce accurate explanations for its own decisions. The other shows that even when explanations are possible, human understanding of them will be incomplete. These limits follow from the complexity of AI decision processes outstripping both the system's ability to describe them and human capacity to grasp the descriptions. If the results hold, then full explainability cannot be achieved for systems in real-world use, affecting safety checks, regulatory compliance, and user trust in decisions that impact people.

Core claim

The paper claims that advanced AIs would not be able to accurately explain some of their decisions and that for the decisions they could explain people would not understand some of those explanations. These two results, labeled Unexplainability and Incomprehensibility, are presented as impossibility results that together rule out complete transparency between advanced AI and human users.

What carries the argument

The pair of complementary impossibility results Unexplainability and Incomprehensibility, which establish limits on an AI's capacity to explain its decisions and on human capacity to comprehend those explanations.

If this is right

Requirements for explainable AI in safety-critical domains cannot be satisfied in full.
Security and safety analysis of advanced AI systems will contain unavoidable gaps from unexplained decisions.
User requests to understand decisions that affect them cannot always be met.
Regulatory standards demanding complete explainability for advanced AI will encounter fundamental barriers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design priorities for future AI may shift away from explanation toward other verification techniques such as empirical testing.
Similar limits on explanation could apply to other complex decision systems, including expert human judgment.
Focus on post-hoc auditing methods rather than built-in explanations may become necessary.

Load-bearing premise

Advanced AI possesses decision processes whose full explanation exceeds both the AI's explanatory capacity and human comprehension limits.

What would settle it

Construction of an advanced AI that supplies accurate explanations for every decision it makes and where humans fully comprehend all supplied explanations.

read the original abstract

Explainability and comprehensibility of AI are important requirements for intelligent systems deployed in real-world domains. Users want and frequently need to understand how decisions impacting them are made. Similarly it is important to understand how an intelligent system functions for safety and security reasons. In this paper, we describe two complementary impossibility results (Unexplainability and Incomprehensibility), essentially showing that advanced AIs would not be able to accurately explain some of their decisions and for the decisions they could explain people would not understand some of those explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper restates known limits on AI interpretability as paired impossibility claims but supplies no formal definitions, models, or derivations to support them.

read the letter

The core takeaway is that Yampolskiy presents unexplainability and incomprehensibility as complementary impossibility results for advanced AI, yet the argument stays at the level of informal assertion. The abstract states that sufficiently complex systems will produce decisions the AI cannot accurately explain and explanations humans cannot understand, but nothing in the provided content shows how this follows from any specific premises about computation or explanation. The paper does flag a real practical issue for AI safety: when systems grow complex enough, full human-interpretable accounts become unrealistic, and this connects to existing concerns about trust and verification in deployed systems. That observation is worth noting even if it is not original. The main weakness is the absence of any formal model. There are no definitions of what counts as an accurate explanation, no information-theoretic or logical account of comprehension, and no threshold or proof that moves from high complexity to outright impossibility rather than mere difficulty. This leaves the claims dependent on the chosen meanings of the key terms, which is the exact circularity the stress-test note identifies. Earlier work on interpretable machine learning already covers the same ground without framing it as impossibility theorems, so the contribution is mostly rephrasing. The paper is aimed at readers interested in high-level philosophical arguments about AI limits rather than technical derivations or empirical tests. A reader looking for rigorous analysis or falsifiable claims will find little to engage with. I would not bring this to a reading group, would not cite it, and would not send it for peer review because the lack of formal content makes it unsuitable for that process.

Referee Report

2 major / 1 minor

Summary. The manuscript presents two complementary impossibility results for advanced AI: unexplainability, asserting that such systems cannot accurately explain some of their decisions because decision-process complexity exceeds the AI's explanatory capacity, and incomprehensibility, asserting that humans cannot understand some explanations even when the AI can provide them. The claims rest on informal arguments linking complexity to these limits without formal models.

Significance. If the results were established via rigorous formalization, they would highlight conceptual barriers to explainable AI in high-stakes domains and inform discussions on transparency requirements. The paper usefully flags that complexity can outstrip both self-explanation and human comprehension, but the absence of derivations or models means the contribution remains at the level of known interpretability challenges rather than new impossibility theorems.

major comments (2)

[Abstract] Abstract: the unexplainability claim that 'advanced AIs would not be able to accurately explain some of their decisions' is asserted without a formal model of explanation (e.g., via logical entailment, information-theoretic fidelity, or counterfactuals) or a complexity threshold, so the inference from 'high complexity' to 'impossible to explain accurately' is not derived and is load-bearing for the central result.
[Abstract] Abstract: the incomprehensibility claim similarly lacks a precise definition of 'understand' or a model showing why AI-provided explanations must exceed human limits for some decisions; without this, the result reduces to the observation that some systems are hard to interpret, which does not establish impossibility and is load-bearing for the complementary claim.

minor comments (1)

The abstract could more explicitly separate the two results and their distinct premises to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the need for greater precision regarding the formal status of our arguments. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the unexplainability claim that 'advanced AIs would not be able to accurately explain some of their decisions' is asserted without a formal model of explanation (e.g., via logical entailment, information-theoretic fidelity, or counterfactuals) or a complexity threshold, so the inference from 'high complexity' to 'impossible to explain accurately' is not derived and is load-bearing for the central result.

Authors: The manuscript frames unexplainability as a conceptual impossibility result arising from the mismatch between the complexity of advanced AI decision processes and the capacity of any self-generated explanation. We acknowledge that the link is informal rather than derived from a specific formal model of explanation or an explicit complexity threshold. The contribution is intended as a high-level argument connecting complexity considerations to XAI requirements rather than a mathematical theorem. We will revise the abstract and introduction to explicitly characterize the argument as conceptual and informal. revision: partial
Referee: [Abstract] Abstract: the incomprehensibility claim similarly lacks a precise definition of 'understand' or a model showing why AI-provided explanations must exceed human limits for some decisions; without this, the result reduces to the observation that some systems are hard to interpret, which does not establish impossibility and is load-bearing for the complementary claim.

Authors: We agree that the incomprehensibility argument similarly rests on an informal connection between explanation complexity and human cognitive limits without a formal model of understanding. The paper presents this as a complementary conceptual limit rather than a formally derived impossibility. We will revise the abstract to clarify the informal and conceptual character of both results so that readers do not interpret them as formal theorems. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents two complementary impossibility results as philosophical assertions about advanced AI decision processes exceeding explanatory capacity and human comprehension. No equations, parameter fitting, self-citation load-bearing premises, uniqueness theorems, or ansatzes are described in the provided abstract or structure. The claims rest on informal reasoning about complexity thresholds rather than any derivation chain that reduces outputs to inputs by construction. This matches the default expectation for non-circular papers; the absence of formal models or derivations means no load-bearing steps can be exhibited as self-referential per the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the claim rests on an implicit assumption about the nature of advanced AI decision processes.

pith-pipeline@v0.9.0 · 5602 in / 985 out tokens · 20490 ms · 2026-05-25T18:59:04.717008+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making
cs.AI 2026-04 unverdicted novelty 5.0

This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · cited by 1 Pith paper · 20 internal anchors

[1]

Journal of Consciousness Studies JCS, 2012

Yampolskiy, R.V., Leakproofing Singularity -Artificial Intelligence Confinement Problem. Journal of Consciousness Studies JCS, 2012

work page 2012
[2]

Armstrong, S. and R.V. Yampolskiy, Security solutions for intelligent and complex systems, in Security Solutions for Hyperconnectivity and the Internet of Things . 2017, IGI Global. p. 37-88

work page 2017
[3]

Nature, 2017

Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017. 550(7676): p. 354

work page 2017
[4]

2014: Oxford University Press

Bostrom, N., Superintelligence: Paths, dangers, strategies. 2014: Oxford University Press

work page 2014
[5]

Strohmeier, S. and F. Piazza, Artificial Intelligence Techniques in Human Resource Management—A Conceptual Exploration , in Intelligent Techniques in Engineering Management. 2015, Springer. p. 149-172

work page 2015
[6]

Walczak, S. and T. Sincich, A comparative analysis of regression and neural networks for university admissions. Information Sciences, 1999. 119(1-2): p. 1-20

work page 1999
[7]

Trippi, R.R. and E. Turban, Neural networks in finance and investing: Using artifi cial intelligence to improve real world performance. 1992: McGraw-Hill, Inc

work page 1992
[8]

Eastwick, and E.J

Joel, S., P.W. Eastwick, and E.J. Finkel, Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychological science, 2017. 28(10): p. 1478-1489

work page 2017
[9]

Evaluating race and sex diversity in the world's largest companies using deep neural networks

Chekanov, K., et al., Evaluating race and sex diversity in the world's largest companies using deep neural networks. arXiv preprint arXiv:1707.02353, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Yampolskiy, and L

Novikov, D., R.V. Yampolskiy, and L. Reznik. Artificial intelligence approaches for intrusion detection . in 2006 IEEE Long Island Systems, Applications and Technology Conference. 2006. IEEE

work page 2006
[11]

Yampolskiy, and L

Novikov, D., R.V. Yampolskiy, and L. Reznik. Anomaly detection based intrusion detection. in Third International Conference on Information Technology: New Generations (ITNG'06)

work page
[12]

Wang, and D

Wang, H., N. Wang, and D. -Y. Yeung. Collaborative deep learning for recommender systems. in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015. ACM

work page 2015
[13]

Galindo, J. and P. Tamayo, Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics, 2000. 15(1 - 2): p. 107-143

work page 2000
[14]

right to explanation

Goodman, B. and S. Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 2017. 38(3): p. 50-57

work page 2017
[15]

arXiv preprint arXiv:1711.01134, 2017

Doshi-Velez, F., et al., Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017
[16]

Osoba, O.A. and W. Welser IV, An intelligence in our image: The risks of bias and errors in artificial intelligence. 2017: Rand Corporation

work page 2017
[17]

2018: Chapman and Hall/CRC

Yampolskiy, R.V., Artificial Intelligence Safety and Security. 2018: Chapman and Hall/CRC

work page 2018
[18]

2015: Chapman and Hall/CRC

Yampolskiy, R.V., Artificial superintelligence: a futuristic approach. 2015: Chapman and Hall/CRC

work page 2015
[19]

2013, Springer

Yampolskiy, R.V., What to Do with the Singularity Paradox? , in Philosophy and Theory of Artificial Intelligence. 2013, Springer. p. 397-413

work page 2013
[20]

Pistono, F. and R.V. Yampolskiy, Unethical research: how to create a malevolent artificial intelligence. arXiv preprint arXiv:1605.02817, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Umbrello, S. and R. Yampolskiy, Designing AI for Explainability and Verifiability: A Value Sensitive Design Approach to Avoid Artificial Stupidity in Autonomous Vehicles

work page
[22]

Trazzi, M. and R.V. Yampolskiy, Building Safer AGI by introducing Artificial Stupidity. arXiv preprint arXiv:1808.03644, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Personal Universes: A Solution to the Multi-Agent Value Alignment Problem

Yampolskiy, R.V., Personal Universes: A Solution to the Multi -Agent Value Alignment Problem. arXiv preprint arXiv:1901.01851, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[24]

Emergence of Addictive Behaviors in Reinforcement Learning Agents

Behzadan, V., R.V. Yampolskiy, and A. Munir, Emergence of Addictive Behaviors in Reinforcement Learning Agents. arXiv preprint arXiv:1811.05590, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Munir, and R.V

Behzadan, V., A. Munir, and R.V. Yampolskiy. A psychopathological approach to sa fety engineering in ai and agi . in International Conference on Computer Safety, Reliability, and Security. 2018. Springer

work page 2018
[26]

Foresight, 2019

Yampolskiy, R.V., Predicting future AI failures from historic examples. Foresight, 2019. 21(1): p. 138-152

work page 2019
[27]

Defense Advanced Research Projects Agency (DARPA), nd Web, 2017

Gunning, D., Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2017

work page 2017
[28]

Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions

Ehsan, U., et al., Automated rationale generation: a technique for explainable AI and its effects on human perceptions. arXiv preprint arXiv:1901.03729, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[29]

Russell, and S

Mittelstadt, B., C. Russell, and S. Wachter. Explaining explanations in AI. in Proceedings of the conference on fairness, accountability, and transparency. 2019. ACM

work page 2019
[30]

Interpretable and Pedagogical Examples

Milli, S., P. Abbeel, and I. Mordatch, Interpretable and pedagogical examples. arXiv preprint arXiv:1711.00694, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Kantardzić, M.M. and A.S. Elmaghraby, Logic-oriented model of artificial neural networks. Information sciences, 1997. 101(1-2): p. 85-107

work page 1997
[32]

arXiv preprint arXiv:1802.07810, 2018

Poursabzi-Sangdeh, F., et al., Manipulating and measuring mod el interpretability. arXiv preprint arXiv:1802.07810, 2018

work page arXiv 2018
[33]

Rationalization: A neural machine translation approach to generating natural language explanations

Ehsan, U., et al. Rationalization: A neural machine translation approach to generating natural language explanations . in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018. ACM

work page 2018
[34]

Stakeholders in Explainable AI

Preece, A., et al., Stakeholders in Explainable AI. arXiv preprint arXiv:1810.00184, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Techniques for Interpretable Machine Learning

Du, M., N. Liu, and X. Hu, Techniques for interpretable machine learning. arXiv preprint arXiv:1808.00033, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Lipton, Z.C., The Doctor Just Won't Accept That! arXiv preprint arXiv:1711.08037, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

The Mythos of Model Interpretability

Lipton, Z.C., The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Doshi-Velez, F. and B. Kim, Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Towards Reverse-Engineering Black-Box Neural Networks

Oh, S.J., et al., Towards reverse -engineering black -box neural networks. arXiv preprint arXiv:1711.01768, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

Nature communications, 2019

Lapuschkin, S., et al., Unmasking Clever Hans predictors and assessing what machines really learn. Nature communications, 2019. 10(1): p. 1096

work page 2019
[41]

Singh, and C

Ribeiro, M.T., S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier . in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. ACM

work page 2016
[42]

Adadi, A. and M. Berrada, Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 2018. 6: p. 52138-52160

work page 2018
[43]

Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda

Abdul, A., et al. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda . in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018. ACM

work page 2018
[44]

ACM computing surveys (CSUR), 2018

Guidotti, R., et al., A survey of methods for explaining black box models. ACM computing surveys (CSUR), 2018. 51(5): p. 93

work page 2018
[45]

Brčić, and N

Došilović, F.K., M. Brčić, and N. Hlupić. Explainable artificial intelligence: A survey . in 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). 2018. IEEE

work page 2018
[46]

Artificial Intelligence, 2018

Miller, T., Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 2018

work page 2018
[47]

Klare, and A.K

Yampolskiy, R.V., B. Klare, and A.K. Jain. Face recognition in the virtual world: recognizing avatar faces. in 2012 11th International Conference on Machine Learning and Applications

work page 2012
[48]

Mohamed, A.A. and R.V. Yampolskiy. An improved LBP algorithm for avatar face recognition. in 2011 XXIII International Symposium on Information, Communication and Automation Technologies. 2011. IEEE

work page 2011
[49]

Distill, 2019

Carter, S., et al., Activation Atlas. Distill, 2019. 4(3): p. e15

work page 2019
[50]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Kim, B., et al., Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[51]

Distill, 2018

Olah, C., et al., The building blocks of interpretability. Distill, 2018. 3(3): p. e10

work page 2018
[52]

Accuracy and interpretability trade-offs in machine learning applied to safer gambling

Sarkar, S., et al. Accuracy and interpretability trade-offs in machine learning applied to safer gambling. in CEUR Workshop Proceedings. 2016. CEUR Workshop Proceedings

work page 2016
[53]

Gao, and S

Sutcliffe, G., Y. Gao, and S. Colton. A grand challenge of theorem discovery. in Proceedings of the Workshop on Challenges and Novel Applications for Automated Reasoning, 19th International Conference on Automated Reasoning. 2003

work page 2003
[54]

Machine Learning, 2018

Muggleton, S.H., et al., Ultra-Strong Machine Learning: comprehensibility of programs learned with ILP. Machine Learning, 2018. 107(7): p. 1119-1140

work page 2018
[55]

Statistical science, 2001

Breiman, L., Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 2001. 16(3): p. 199-231

work page 2001
[56]

ACM Transactions on Computational Logic (TOCL), 2006

Charlesworth, A., Comprehending software correctness implies comprehending an intelligence-related limitation. ACM Transactions on Computational Logic (TOCL), 2006. 7(3): p. 590-612

work page 2006
[57]

Minds and Machines, 2014

Charlesworth, A., The comprehensibility theorem and the foundations of artificial intelligence. Minds and Machines, 2014. 24(4): p. 439-476

work page 2014
[58]

Hernández-Orallo, J. and N. Minaya-Collado. A formal definition of intelligence based on an intensional variant of algorithmic complexity. in Proceedings of International Symposium of Engineering of Intelligent Systems (EIS98). 1998

work page 1998
[59]

Li, M. and P. Vitányi, An introduction to Kolmogorov complexity and its applications . Vol

work page
[60]

The space of possible mind designs

Yampolskiy, R.V. The space of possible mind designs . in International Conference on Artificial General Intelligence. 2015. Springer

work page 2015
[61]

2017: Available at: https://www.wired.com/story/our-machines-now-have-knowledge-well-never- understand

Weinberger, D., Our machines now have knowledge we’ll never understand, in Wired. 2017: Available at: https://www.wired.com/story/our-machines-now-have-knowledge-well-never- understand

work page 2017
[62]

1992: Courier Corporation

Gödel, K., On formally undecidable propositions of Principia Mathematica and related systems. 1992: Courier Corporation

work page 1992
[63]

1985, Springer

Heisenberg, W., Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, in Original Scientific Papers Wissenschaftliche Originalarbeiten. 1985, Springer. p. 478-504

work page 1985
[64]

Lynch, and M

Fisher, M., N. Lynch, and M. Peterson, Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 1985. 32(2): p. 374-382

work page 1985
[65]

Grossman, S.J. and J.E. Stiglitz, On the impossibility of informationally efficient markets. The American economic review, 1980. 70(3): p. 393-408

work page 1980
[66]

An impossibility theorem for clustering

Kleinberg, J.M. An impossibility theorem for clustering. in Advances in neural information processing systems. 2003

work page 2003
[67]

Philosophical studies, 1994

Strawson, G., The impossibility of moral responsibility. Philosophical studies, 1994. 75(1): p. 5-24

work page 1994
[68]

Morgan, and G.F

Bazerman, M.H., K.P. Morgan, and G.F. Loewenstein, The impossibility of a uditor independence. Sloan Management Review, 1997. 38: p. 89-94

work page 1997
[69]

List, C. and P. Pettit, Aggregating sets of judgments: An impossibility result. Economics & Philosophy, 2002. 18(1): p. 89-110

work page 2002
[70]

Econometrica: Journal of the Econometric Society, 1997: p

Dufour, J.-M., Some impossibility theorems in economet rics with applications to structural and dynamic models. Econometrica: Journal of the Econometric Society, 1997: p. 1365-1387

work page 1997
[71]

Physica Scripta, 2017

Yampolskiy, R.V., What are the ultimate limits to computational techniques: verifier theory and unverifiability. Physica Scripta, 2017. 92(9): p. 093001

work page 2017
[72]

Unpredictability of AI

Yampolskiy, R.V., Unpredictability of AI. arXiv preprint arXiv:1905.13053, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[73]

Armstrong, S. and S. Mindermann, Impossibility of deducing preferences and rationality from human policy. arXiv preprint arXiv:1712.05812, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[74]

Eckersley, P., Impossibility and Uncertainty Theorems in AI Value Alignment

work page
[75]

A framework for explanation of machine learning decisions

Brinton, C. A framework for explanation of machine learning decisions . in IJCAI-17 Workshop on Explainable AI (XAI). 2017

work page 2017
[76]

URL http://prize

Hutter, M., The Human knowledge compression prize. URL http://prize. hutter1. net, 2006

work page 2006
[77]

Retrieved June 16, 2019: Available at: http://www.faqs.org/faqs/compression-faq/part1/section-8.html

Compression of random data (WEB, Gilbert and others) , in Faqs. Retrieved June 16, 2019: Available at: http://www.faqs.org/faqs/compression-faq/part1/section-8.html

work page 2019
[78]

2015: Ecco/HarperCollins Publishers

Gazzaniga, M.S., Tales from both sides of the brain: A life in neuroscience . 2015: Ecco/HarperCollins Publishers

work page 2015
[79]

313(5788): p

Shanks, D.R., Complex choices better made unconsciously? Science, 2006. 313(5788): p. 760-761

work page 2006
[80]

Synthese,

Bassler, O.B., The surveyability of mathematical proof: A historical perspective. Synthese,

work page

Showing first 80 references.

[1] [1]

Journal of Consciousness Studies JCS, 2012

Yampolskiy, R.V., Leakproofing Singularity -Artificial Intelligence Confinement Problem. Journal of Consciousness Studies JCS, 2012

work page 2012

[2] [2]

Armstrong, S. and R.V. Yampolskiy, Security solutions for intelligent and complex systems, in Security Solutions for Hyperconnectivity and the Internet of Things . 2017, IGI Global. p. 37-88

work page 2017

[3] [3]

Nature, 2017

Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017. 550(7676): p. 354

work page 2017

[4] [4]

2014: Oxford University Press

Bostrom, N., Superintelligence: Paths, dangers, strategies. 2014: Oxford University Press

work page 2014

[5] [5]

Strohmeier, S. and F. Piazza, Artificial Intelligence Techniques in Human Resource Management—A Conceptual Exploration , in Intelligent Techniques in Engineering Management. 2015, Springer. p. 149-172

work page 2015

[6] [6]

Walczak, S. and T. Sincich, A comparative analysis of regression and neural networks for university admissions. Information Sciences, 1999. 119(1-2): p. 1-20

work page 1999

[7] [7]

Trippi, R.R. and E. Turban, Neural networks in finance and investing: Using artifi cial intelligence to improve real world performance. 1992: McGraw-Hill, Inc

work page 1992

[8] [8]

Eastwick, and E.J

Joel, S., P.W. Eastwick, and E.J. Finkel, Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychological science, 2017. 28(10): p. 1478-1489

work page 2017

[9] [9]

Evaluating race and sex diversity in the world's largest companies using deep neural networks

Chekanov, K., et al., Evaluating race and sex diversity in the world's largest companies using deep neural networks. arXiv preprint arXiv:1707.02353, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Yampolskiy, and L

Novikov, D., R.V. Yampolskiy, and L. Reznik. Artificial intelligence approaches for intrusion detection . in 2006 IEEE Long Island Systems, Applications and Technology Conference. 2006. IEEE

work page 2006

[11] [11]

Yampolskiy, and L

Novikov, D., R.V. Yampolskiy, and L. Reznik. Anomaly detection based intrusion detection. in Third International Conference on Information Technology: New Generations (ITNG'06)

work page

[12] [12]

Wang, and D

Wang, H., N. Wang, and D. -Y. Yeung. Collaborative deep learning for recommender systems. in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015. ACM

work page 2015

[13] [13]

Galindo, J. and P. Tamayo, Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics, 2000. 15(1 - 2): p. 107-143

work page 2000

[14] [14]

right to explanation

Goodman, B. and S. Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 2017. 38(3): p. 50-57

work page 2017

[15] [15]

arXiv preprint arXiv:1711.01134, 2017

Doshi-Velez, F., et al., Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017

[16] [16]

Osoba, O.A. and W. Welser IV, An intelligence in our image: The risks of bias and errors in artificial intelligence. 2017: Rand Corporation

work page 2017

[17] [17]

2018: Chapman and Hall/CRC

Yampolskiy, R.V., Artificial Intelligence Safety and Security. 2018: Chapman and Hall/CRC

work page 2018

[18] [18]

2015: Chapman and Hall/CRC

Yampolskiy, R.V., Artificial superintelligence: a futuristic approach. 2015: Chapman and Hall/CRC

work page 2015

[19] [19]

2013, Springer

Yampolskiy, R.V., What to Do with the Singularity Paradox? , in Philosophy and Theory of Artificial Intelligence. 2013, Springer. p. 397-413

work page 2013

[20] [20]

Pistono, F. and R.V. Yampolskiy, Unethical research: how to create a malevolent artificial intelligence. arXiv preprint arXiv:1605.02817, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Umbrello, S. and R. Yampolskiy, Designing AI for Explainability and Verifiability: A Value Sensitive Design Approach to Avoid Artificial Stupidity in Autonomous Vehicles

work page

[22] [22]

Trazzi, M. and R.V. Yampolskiy, Building Safer AGI by introducing Artificial Stupidity. arXiv preprint arXiv:1808.03644, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Personal Universes: A Solution to the Multi-Agent Value Alignment Problem

Yampolskiy, R.V., Personal Universes: A Solution to the Multi -Agent Value Alignment Problem. arXiv preprint arXiv:1901.01851, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[24] [24]

Emergence of Addictive Behaviors in Reinforcement Learning Agents

Behzadan, V., R.V. Yampolskiy, and A. Munir, Emergence of Addictive Behaviors in Reinforcement Learning Agents. arXiv preprint arXiv:1811.05590, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Munir, and R.V

Behzadan, V., A. Munir, and R.V. Yampolskiy. A psychopathological approach to sa fety engineering in ai and agi . in International Conference on Computer Safety, Reliability, and Security. 2018. Springer

work page 2018

[26] [26]

Foresight, 2019

Yampolskiy, R.V., Predicting future AI failures from historic examples. Foresight, 2019. 21(1): p. 138-152

work page 2019

[27] [27]

Defense Advanced Research Projects Agency (DARPA), nd Web, 2017

Gunning, D., Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2017

work page 2017

[28] [28]

Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions

Ehsan, U., et al., Automated rationale generation: a technique for explainable AI and its effects on human perceptions. arXiv preprint arXiv:1901.03729, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[29] [29]

Russell, and S

Mittelstadt, B., C. Russell, and S. Wachter. Explaining explanations in AI. in Proceedings of the conference on fairness, accountability, and transparency. 2019. ACM

work page 2019

[30] [30]

Interpretable and Pedagogical Examples

Milli, S., P. Abbeel, and I. Mordatch, Interpretable and pedagogical examples. arXiv preprint arXiv:1711.00694, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Kantardzić, M.M. and A.S. Elmaghraby, Logic-oriented model of artificial neural networks. Information sciences, 1997. 101(1-2): p. 85-107

work page 1997

[32] [32]

arXiv preprint arXiv:1802.07810, 2018

Poursabzi-Sangdeh, F., et al., Manipulating and measuring mod el interpretability. arXiv preprint arXiv:1802.07810, 2018

work page arXiv 2018

[33] [33]

Rationalization: A neural machine translation approach to generating natural language explanations

Ehsan, U., et al. Rationalization: A neural machine translation approach to generating natural language explanations . in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018. ACM

work page 2018

[34] [34]

Stakeholders in Explainable AI

Preece, A., et al., Stakeholders in Explainable AI. arXiv preprint arXiv:1810.00184, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

Techniques for Interpretable Machine Learning

Du, M., N. Liu, and X. Hu, Techniques for interpretable machine learning. arXiv preprint arXiv:1808.00033, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

Lipton, Z.C., The Doctor Just Won't Accept That! arXiv preprint arXiv:1711.08037, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

The Mythos of Model Interpretability

Lipton, Z.C., The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Doshi-Velez, F. and B. Kim, Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Towards Reverse-Engineering Black-Box Neural Networks

Oh, S.J., et al., Towards reverse -engineering black -box neural networks. arXiv preprint arXiv:1711.01768, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[40] [40]

Nature communications, 2019

Lapuschkin, S., et al., Unmasking Clever Hans predictors and assessing what machines really learn. Nature communications, 2019. 10(1): p. 1096

work page 2019

[41] [41]

Singh, and C

Ribeiro, M.T., S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier . in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. ACM

work page 2016

[42] [42]

Adadi, A. and M. Berrada, Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 2018. 6: p. 52138-52160

work page 2018

[43] [43]

Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda

Abdul, A., et al. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda . in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018. ACM

work page 2018

[44] [44]

ACM computing surveys (CSUR), 2018

Guidotti, R., et al., A survey of methods for explaining black box models. ACM computing surveys (CSUR), 2018. 51(5): p. 93

work page 2018

[45] [45]

Brčić, and N

Došilović, F.K., M. Brčić, and N. Hlupić. Explainable artificial intelligence: A survey . in 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). 2018. IEEE

work page 2018

[46] [46]

Artificial Intelligence, 2018

Miller, T., Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 2018

work page 2018

[47] [47]

Klare, and A.K

Yampolskiy, R.V., B. Klare, and A.K. Jain. Face recognition in the virtual world: recognizing avatar faces. in 2012 11th International Conference on Machine Learning and Applications

work page 2012

[48] [48]

Mohamed, A.A. and R.V. Yampolskiy. An improved LBP algorithm for avatar face recognition. in 2011 XXIII International Symposium on Information, Communication and Automation Technologies. 2011. IEEE

work page 2011

[49] [49]

Distill, 2019

Carter, S., et al., Activation Atlas. Distill, 2019. 4(3): p. e15

work page 2019

[50] [50]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Kim, B., et al., Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[51] [51]

Distill, 2018

Olah, C., et al., The building blocks of interpretability. Distill, 2018. 3(3): p. e10

work page 2018

[52] [52]

Accuracy and interpretability trade-offs in machine learning applied to safer gambling

Sarkar, S., et al. Accuracy and interpretability trade-offs in machine learning applied to safer gambling. in CEUR Workshop Proceedings. 2016. CEUR Workshop Proceedings

work page 2016

[53] [53]

Gao, and S

Sutcliffe, G., Y. Gao, and S. Colton. A grand challenge of theorem discovery. in Proceedings of the Workshop on Challenges and Novel Applications for Automated Reasoning, 19th International Conference on Automated Reasoning. 2003

work page 2003

[54] [54]

Machine Learning, 2018

Muggleton, S.H., et al., Ultra-Strong Machine Learning: comprehensibility of programs learned with ILP. Machine Learning, 2018. 107(7): p. 1119-1140

work page 2018

[55] [55]

Statistical science, 2001

Breiman, L., Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 2001. 16(3): p. 199-231

work page 2001

[56] [56]

ACM Transactions on Computational Logic (TOCL), 2006

Charlesworth, A., Comprehending software correctness implies comprehending an intelligence-related limitation. ACM Transactions on Computational Logic (TOCL), 2006. 7(3): p. 590-612

work page 2006

[57] [57]

Minds and Machines, 2014

Charlesworth, A., The comprehensibility theorem and the foundations of artificial intelligence. Minds and Machines, 2014. 24(4): p. 439-476

work page 2014

[58] [58]

Hernández-Orallo, J. and N. Minaya-Collado. A formal definition of intelligence based on an intensional variant of algorithmic complexity. in Proceedings of International Symposium of Engineering of Intelligent Systems (EIS98). 1998

work page 1998

[59] [59]

Li, M. and P. Vitányi, An introduction to Kolmogorov complexity and its applications . Vol

work page

[60] [60]

The space of possible mind designs

Yampolskiy, R.V. The space of possible mind designs . in International Conference on Artificial General Intelligence. 2015. Springer

work page 2015

[61] [61]

2017: Available at: https://www.wired.com/story/our-machines-now-have-knowledge-well-never- understand

Weinberger, D., Our machines now have knowledge we’ll never understand, in Wired. 2017: Available at: https://www.wired.com/story/our-machines-now-have-knowledge-well-never- understand

work page 2017

[62] [62]

1992: Courier Corporation

Gödel, K., On formally undecidable propositions of Principia Mathematica and related systems. 1992: Courier Corporation

work page 1992

[63] [63]

1985, Springer

Heisenberg, W., Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, in Original Scientific Papers Wissenschaftliche Originalarbeiten. 1985, Springer. p. 478-504

work page 1985

[64] [64]

Lynch, and M

Fisher, M., N. Lynch, and M. Peterson, Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 1985. 32(2): p. 374-382

work page 1985

[65] [65]

Grossman, S.J. and J.E. Stiglitz, On the impossibility of informationally efficient markets. The American economic review, 1980. 70(3): p. 393-408

work page 1980

[66] [66]

An impossibility theorem for clustering

Kleinberg, J.M. An impossibility theorem for clustering. in Advances in neural information processing systems. 2003

work page 2003

[67] [67]

Philosophical studies, 1994

Strawson, G., The impossibility of moral responsibility. Philosophical studies, 1994. 75(1): p. 5-24

work page 1994

[68] [68]

Morgan, and G.F

Bazerman, M.H., K.P. Morgan, and G.F. Loewenstein, The impossibility of a uditor independence. Sloan Management Review, 1997. 38: p. 89-94

work page 1997

[69] [69]

List, C. and P. Pettit, Aggregating sets of judgments: An impossibility result. Economics & Philosophy, 2002. 18(1): p. 89-110

work page 2002

[70] [70]

Econometrica: Journal of the Econometric Society, 1997: p

Dufour, J.-M., Some impossibility theorems in economet rics with applications to structural and dynamic models. Econometrica: Journal of the Econometric Society, 1997: p. 1365-1387

work page 1997

[71] [71]

Physica Scripta, 2017

Yampolskiy, R.V., What are the ultimate limits to computational techniques: verifier theory and unverifiability. Physica Scripta, 2017. 92(9): p. 093001

work page 2017

[72] [72]

Unpredictability of AI

Yampolskiy, R.V., Unpredictability of AI. arXiv preprint arXiv:1905.13053, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[73] [73]

Armstrong, S. and S. Mindermann, Impossibility of deducing preferences and rationality from human policy. arXiv preprint arXiv:1712.05812, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[74] [74]

Eckersley, P., Impossibility and Uncertainty Theorems in AI Value Alignment

work page

[75] [75]

A framework for explanation of machine learning decisions

Brinton, C. A framework for explanation of machine learning decisions . in IJCAI-17 Workshop on Explainable AI (XAI). 2017

work page 2017

[76] [76]

URL http://prize

Hutter, M., The Human knowledge compression prize. URL http://prize. hutter1. net, 2006

work page 2006

[77] [77]

Retrieved June 16, 2019: Available at: http://www.faqs.org/faqs/compression-faq/part1/section-8.html

Compression of random data (WEB, Gilbert and others) , in Faqs. Retrieved June 16, 2019: Available at: http://www.faqs.org/faqs/compression-faq/part1/section-8.html

work page 2019

[78] [78]

2015: Ecco/HarperCollins Publishers

Gazzaniga, M.S., Tales from both sides of the brain: A life in neuroscience . 2015: Ecco/HarperCollins Publishers

work page 2015

[79] [79]

313(5788): p

Shanks, D.R., Complex choices better made unconsciously? Science, 2006. 313(5788): p. 760-761

work page 2006

[80] [80]

Synthese,

Bassler, O.B., The surveyability of mathematical proof: A historical perspective. Synthese,

work page