Classification Schemas for Artificial Intelligence Failures
Pith reviewed 2026-05-24 21:08 UTC · model grok-4.3
The pith
Classifying historical AI failures can simplify responses to future failures and support risk assessments in development.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors examine historical failures of artificial intelligence and propose a classification scheme for categorizing future failures. This scheme is intended to simplify the choice of response to future failures and to allow development lifecycles to be augmented with targeted risk assessments, ultimately reducing the number of future failures.
What carries the argument
A classification scheme for AI failures based on historical examples that organizes them to guide responses and risk assessments.
If this is right
- Future AI failures can be responded to more efficiently by matching them to known categories.
- AI development processes can incorporate specific risk assessments derived from the failure categories.
- Overall incidence of AI failures may decrease due to preventive measures informed by the classification.
Where Pith is reading between the lines
- The scheme might require periodic updates to cover failure modes from new AI capabilities absent in the historical record.
- It could serve as a template for creating similar taxonomies in related domains such as autonomous systems or machine learning security.
- Adoption in industry standards might lead to more uniform safety practices across different organizations.
Load-bearing premise
That a classification derived from historical AI failures will generalize usefully to future failures whose causes and contexts may differ substantially from the examined cases.
What would settle it
A series of new AI failures occurring that do not fit the proposed categories and for which the classification does not simplify the choice of response.
Figures
read the original abstract
In this paper we examine historical failures of artificial intelligence (AI) and propose a classification scheme for categorizing future failures. By doing so we hope that (a) the responses to future failures can be improved through applying a systematic classification that can be used to simplify the choice of response and (b) future failures can be reduced through augmenting development lifecycles with targeted risk assessments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines historical failures of artificial intelligence (AI) and proposes a classification scheme for categorizing future failures. By doing so the authors hope that (a) the responses to future failures can be improved through applying a systematic classification that can be used to simplify the choice of response and (b) future failures can be reduced through augmenting development lifecycles with targeted risk assessments.
Significance. If the proposed classification schema can be shown to generalize, it would offer a structured framework for AI risk analysis that builds systematically on historical cases, potentially aiding standardization in safety practices. The manuscript's strength is its literature-based construction of categories from documented incidents, providing a clear starting point for further work even without validation data.
major comments (2)
- [Abstract] Abstract: the central claims that the classification 'can be used to simplify the choice of response' and will 'reduce future failures' through targeted risk assessments are asserted without any validation data, controlled comparison, case study application, or quantitative assessment of utility.
- [Classification schema presentation] The construction of the schema (detailed in the section presenting the classification) relies exclusively on pre-2019 historical cases and supplies no mechanism or test for handling distribution shift, emergent behaviors in large-scale models, or new deployment contexts, which directly undermines the generalization required for claim (b).
minor comments (2)
- [Throughout] Notation for category definitions could be made more consistent across the text to aid readability.
- [Introduction] Additional references to related taxonomies in AI safety literature would strengthen the positioning of the proposal.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims that the classification 'can be used to simplify the choice of response' and will 'reduce future failures' through targeted risk assessments are asserted without any validation data, controlled comparison, case study application, or quantitative assessment of utility.
Authors: We agree that the abstract presents these benefits as direct outcomes without supporting validation. The manuscript is exploratory and derives the schema from historical cases to suggest logical applications rather than demonstrate them empirically. We will revise the abstract to replace assertive phrasing with tentative language (e.g., 'we propose that the classification may help' and 'could support targeted risk assessments') and add a clause noting that empirical evaluation of utility remains future work. revision: yes
-
Referee: [Classification schema presentation] The construction of the schema (detailed in the section presenting the classification) relies exclusively on pre-2019 historical cases and supplies no mechanism or test for handling distribution shift, emergent behaviors in large-scale models, or new deployment contexts, which directly undermines the generalization required for claim (b).
Authors: The schema is constructed from pre-2019 cases because the paper's scope is a literature-based analysis of documented incidents to derive categories. No explicit mechanism for distribution shift is supplied, as the work focuses on establishing an initial taxonomy rather than a dynamic updating procedure. The categories are defined at a level of abstraction (root cause and impact types) intended to remain applicable across contexts, but we acknowledge this does not constitute a test for emergent behaviors. We will add a dedicated limitations paragraph in the discussion section addressing generalization and suggesting extension protocols for future cases. revision: partial
Circularity Check
No circularity: classification derived from external historical cases via literature review
full rationale
The paper constructs its classification schema by examining historical AI failures drawn from external sources and applying logical categorization. No equations, fitted parameters, self-citations that bear the central claim, or derivations appear. The proposal does not reduce any result to inputs defined by the authors' prior work; it is a descriptive taxonomy whose utility for future cases is presented as an open empirical question rather than a self-contained derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Historical AI failures can be grouped into categories that generalize to future failures.
- domain assumption A systematic classification simplifies choice of response and enables targeted risk assessments.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We address each of these steps in proposing the following dimensions as useful classification criteria for AI failures: Consequences (phenomenology), Agency (etiology), Preventability (ontology), Stage of introduction in the product lifecycle
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Neumann (Neumann, Computer-Related Risks, 1994) described a classification for computer risk factors... We find this list too broad... We modify and extend earlier work by Yampolskiy (2016)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
and employ 22,000 PhD researchers [2]. It is estimated to create 133 million new roles by 2022 but to displace 75 million jobs in the same period [6]. Projections for the eventual impact of AI on humanity range from utopia (Kurzweil,
work page 2022
-
[2]
(p.487) to extinction (Bostrom, 2005). In many respects AI development outpaces the efforts of prognosticators to predict its progress and is inherently unpredictable (Yampolskiy, 2019). Yet all AI development is (so far) undertaken by humans, and the field of software development is noteworthy for unreliability of delivering on promises: over two-thirds ...
work page 2005
-
[3]
that researchers have taken to metaanalysis of the predictions through correlation against metrics such as coding experience of the predictors [5].Less contentious is the assertion that the development of AGI will inevitably lead to the development of ASI: artificial superintelligence, an AI many times more intelligent than the smartest human, if only by ...
work page 1993
-
[4]
if not impossible (Yudkowsky, 2002). Many of the problems presented by a superintelligence resemble exercises in international diplomacy more than computer software challenges; for instance, the value alignment problem (Bostrom,
work page 2002
-
[5]
many fewer vendors are willing to identify their products as AI than during the current period of myriad AI technologies clogging the “peak of inflated expectations” in the Gartner Hype Cycle. [6]Failure is defined as “the nonperformance or inability of the system or component to perform its expected function for a specified time under specified environme...
work page 1995
-
[6]
described a classification for computer risk factors (see table 1).Problem sources and examplesRequirements definition, omissions, mistakesSystem design, flawsHardware implementation, wiring, chip flawsSoftware implementation, program bugs, compiler bugsSystem use and operation, inadvertent mistakesWilful system misuseHardware, communication, or other equ...
work page 1989
-
[7]
A superintelligence might be highly resistant to decommissioning
(p.154).3.2.4 A common taxonomy for computer system errors is the software development lifecycle stage (see table 5); it is often asserted that the cost of fixing an error at each stage is ten times the cost of fixing it in the previous stage (Dawson, Burrell, Rahim, & Brewster, 2010).Lifecycle StageCodeConceptLCDesignLDDevelopmentLETestingLTOperationLODe...
work page 2010
-
[8]
(CIP, AN, PT, LD). But these and other more fatal accidents with industrial robots going back at least to 1984 when an operator was killed by a 2,500 lb robot that came behind him with no warning (Fuller,
work page 1984
-
[9]
(CIP, CCF, AA, PS, LD).AI accidents may result in direct financial loss. The May 2010 “Flash Crash” resulted in the Dow Jones Industrial Average dropping about 9% for 36 minutes and resulted from program trading algorithms being inadequately prepared to deal with large volumes of strategically-placed trades which themselves were computer-mediated malice
work page 2010
-
[10]
(CIF, CCF, AA, AM, PD, LD, LT). Remediation efforts did not prevent more flash crashes in 2015 [17].A major concern in the application of AI is privacy. Consumer devices connected to corporate clouds of identity data come under scrutiny, especially when, for instance, an Amazon Alexa node recorded a private conversation and sent it to a random contact
work page 2015
-
[11]
(CIE, CIF, CYC, AA, PD, LE). Just as human bias often results from inadequate exposure to diversity, AI bias often arises from the same cause. An attempt to use AI to objectively judge an online international beauty contest without human bias failed when only one of 44 winners it chose had dark skin, prompting speculation that this was due to the training...
work page 2017
-
[12]
(CYS, AI, PT, LC). While this software is being used to create exactly its intended effect, we label this a failure because it has consequences many western observers would consider to be socially harmful. China has a “social credit” scoring system reminiscent of a Black Mirror episode (Wright, 2016), linked to social media and consumer systems such as Se...
work page 2016
-
[13]
and the attention paid by children in class [40], with the most attentive being rewarded (CIM, CYC, AI, PT, LC).In the West the dangers are more nascent. Researchers at the University of Pennsylvania demonstrated that textual analysis of an individual’s Facebook posts could predict 21 different medical conditions such as diabetes (Merchant, Asch, Crutchle...
work page 2019
-
[14]
An AI designed to do X will eventually fail to do X,
[47]. Companies exploit human psychology to get our attention [48], the US military studies how to influence Twitter users [49], and the Pentagon wants to predict protests against the US President via social media surveillance [50].As Yampolskiy (2016) pointed out, “An AI designed to do X will eventually fail to do X,” codified as the Fundamental Theorem ...
work page 2016
-
[15]
the parallels with HAL 9000 of 2001: A Space Odyssey were so irresistible as to obscure the real risks of a computer failure in a critical environment. Apple’s Siri’s initial response to the request “Call me an ambulance” was to refer to the user thereafter as “ambulance”
work page 2001
-
[16]
that they check every category of failure classification, suggesting a path towards unbounded risk. They can exploit misfeatures or bugs in their environment, such as when in the developmental stages of the NERO video game, players’ robots evolved a wiggling motion that allowed them to walk up walls rather than solve the obstacles “properly” by walking a...
work page 2005
-
[17]
(CIP AM, PD, LE).Most shows that explore AI failure develop a theme epitomized by Terminator series: a massive AI becomes self-aware and attempts to destroy humanity. (CIP, CIE, CIF, CCF, CYF, CYS, CYC, AN, AI, PD, LC, LO). Variations include Colossus: The Forbin Project, where the AI imprisons humanity to end conflict (CIM, CIE, CYS, CYC, AN, AI, PD, LC,...
work page 1947
-
[18]
and Maas’ application to AI: “At their extreme, unexpected interactions between competing systems, especially in cyberspace, could cause unexpected escalation—a ‘flash war’, analogous to the algorithmic flash crashes observed in the financial sector.” (Maas, 2018)5. ResponsesThere are various responses to these failures and risks. Several address privacy....
work page 2018
-
[19]
ConclusionsWhile we have not made recommendations as to how to address AI failures in each category of the dimensions we have presented, we hope that this classification scheme will make the development of remediation approaches easier.The importance of this effort may be extrapolated from Leveson’s observation that “The design of the automated system may...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.