pith. sign in

arxiv: 1907.07771 · v1 · pith:J62SPG23new · submitted 2019-07-15 · 💻 cs.CY

Classification Schemas for Artificial Intelligence Failures

Pith reviewed 2026-05-24 21:08 UTC · model grok-4.3

classification 💻 cs.CY
keywords AI failuresclassification schemeartificial intelligence safetyrisk assessmentfailure categorizationdevelopment lifecyclehistorical analysis
0
0 comments X

The pith

Classifying historical AI failures can simplify responses to future failures and support risk assessments in development.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews past instances of artificial intelligence systems failing and develops a way to group them into categories. The authors propose that having such categories makes it easier to decide what to do when new failures occur. They also argue that using these categories to check for risks while developing AI could help avoid some problems before they happen. The goal is to move from reacting to each failure individually to having a structured approach based on patterns from history.

Core claim

The authors examine historical failures of artificial intelligence and propose a classification scheme for categorizing future failures. This scheme is intended to simplify the choice of response to future failures and to allow development lifecycles to be augmented with targeted risk assessments, ultimately reducing the number of future failures.

What carries the argument

A classification scheme for AI failures based on historical examples that organizes them to guide responses and risk assessments.

If this is right

  • Future AI failures can be responded to more efficiently by matching them to known categories.
  • AI development processes can incorporate specific risk assessments derived from the failure categories.
  • Overall incidence of AI failures may decrease due to preventive measures informed by the classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scheme might require periodic updates to cover failure modes from new AI capabilities absent in the historical record.
  • It could serve as a template for creating similar taxonomies in related domains such as autonomous systems or machine learning security.
  • Adoption in industry standards might lead to more uniform safety practices across different organizations.

Load-bearing premise

That a classification derived from historical AI failures will generalize usefully to future failures whose causes and contexts may differ substantially from the examined cases.

What would settle it

A series of new AI failures occurring that do not fit the proposed categories and for which the classification does not simplify the choice of response.

Figures

Figures reproduced from arXiv: 1907.07771 by Peter J. Scott, Roman V. Yampolskiy.

Figure 1
Figure 1. Figure 1: Neumann and Parker Computer misuse technique classes [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Near- and Far-Term AI Failure Scenarios [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
read the original abstract

In this paper we examine historical failures of artificial intelligence (AI) and propose a classification scheme for categorizing future failures. By doing so we hope that (a) the responses to future failures can be improved through applying a systematic classification that can be used to simplify the choice of response and (b) future failures can be reduced through augmenting development lifecycles with targeted risk assessments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper examines historical failures of artificial intelligence (AI) and proposes a classification scheme for categorizing future failures. By doing so the authors hope that (a) the responses to future failures can be improved through applying a systematic classification that can be used to simplify the choice of response and (b) future failures can be reduced through augmenting development lifecycles with targeted risk assessments.

Significance. If the proposed classification schema can be shown to generalize, it would offer a structured framework for AI risk analysis that builds systematically on historical cases, potentially aiding standardization in safety practices. The manuscript's strength is its literature-based construction of categories from documented incidents, providing a clear starting point for further work even without validation data.

major comments (2)
  1. [Abstract] Abstract: the central claims that the classification 'can be used to simplify the choice of response' and will 'reduce future failures' through targeted risk assessments are asserted without any validation data, controlled comparison, case study application, or quantitative assessment of utility.
  2. [Classification schema presentation] The construction of the schema (detailed in the section presenting the classification) relies exclusively on pre-2019 historical cases and supplies no mechanism or test for handling distribution shift, emergent behaviors in large-scale models, or new deployment contexts, which directly undermines the generalization required for claim (b).
minor comments (2)
  1. [Throughout] Notation for category definitions could be made more consistent across the text to aid readability.
  2. [Introduction] Additional references to related taxonomies in AI safety literature would strengthen the positioning of the proposal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims that the classification 'can be used to simplify the choice of response' and will 'reduce future failures' through targeted risk assessments are asserted without any validation data, controlled comparison, case study application, or quantitative assessment of utility.

    Authors: We agree that the abstract presents these benefits as direct outcomes without supporting validation. The manuscript is exploratory and derives the schema from historical cases to suggest logical applications rather than demonstrate them empirically. We will revise the abstract to replace assertive phrasing with tentative language (e.g., 'we propose that the classification may help' and 'could support targeted risk assessments') and add a clause noting that empirical evaluation of utility remains future work. revision: yes

  2. Referee: [Classification schema presentation] The construction of the schema (detailed in the section presenting the classification) relies exclusively on pre-2019 historical cases and supplies no mechanism or test for handling distribution shift, emergent behaviors in large-scale models, or new deployment contexts, which directly undermines the generalization required for claim (b).

    Authors: The schema is constructed from pre-2019 cases because the paper's scope is a literature-based analysis of documented incidents to derive categories. No explicit mechanism for distribution shift is supplied, as the work focuses on establishing an initial taxonomy rather than a dynamic updating procedure. The categories are defined at a level of abstraction (root cause and impact types) intended to remain applicable across contexts, but we acknowledge this does not constitute a test for emergent behaviors. We will add a dedicated limitations paragraph in the discussion section addressing generalization and suggesting extension protocols for future cases. revision: partial

Circularity Check

0 steps flagged

No circularity: classification derived from external historical cases via literature review

full rationale

The paper constructs its classification schema by examining historical AI failures drawn from external sources and applying logical categorization. No equations, fitted parameters, self-citations that bear the central claim, or derivations appear. The proposal does not reduce any result to inputs defined by the authors' prior work; it is a descriptive taxonomy whose utility for future cases is presented as an open empirical question rather than a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on the domain assumption that historical failures supply transferable categories and that classification itself improves response choice and risk reduction; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Historical AI failures can be grouped into categories that generalize to future failures.
    Invoked in the abstract when the authors state that examining historical failures will improve responses to future ones.
  • domain assumption A systematic classification simplifies choice of response and enables targeted risk assessments.
    Central to the hoped-for benefits listed in the abstract.

pith-pipeline@v0.9.0 · 5578 in / 1196 out tokens · 17593 ms · 2026-05-24T21:08:18.734507+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    It is estimated to create 133 million new roles by 2022 but to displace 75 million jobs in the same period [6]

    and employ 22,000 PhD researchers [2]. It is estimated to create 133 million new roles by 2022 but to displace 75 million jobs in the same period [6]. Projections for the eventual impact of AI on humanity range from utopia (Kurzweil,

  2. [2]

    In many respects AI development outpaces the efforts of prognosticators to predict its progress and is inherently unpredictable (Yampolskiy, 2019)

    (p.487) to extinction (Bostrom, 2005). In many respects AI development outpaces the efforts of prognosticators to predict its progress and is inherently unpredictable (Yampolskiy, 2019). Yet all AI development is (so far) undertaken by humans, and the field of software development is noteworthy for unreliability of delivering on promises: over two-thirds ...

  3. [3]

    Analysis of the approach of confining a superintelligence has concluded this would be difficult (Yampolskiy,

    that researchers have taken to metaanalysis of the predictions through correlation against metrics such as coding experience of the predictors [5].Less contentious is the assertion that the development of AGI will inevitably lead to the development of ASI: artificial superintelligence, an AI many times more intelligent than the smartest human, if only by ...

  4. [4]

    if not impossible (Yudkowsky, 2002). Many of the problems presented by a superintelligence resemble exercises in international diplomacy more than computer software challenges; for instance, the value alignment problem (Bostrom,

  5. [5]

    peak of inflated expectations

    many fewer vendors are willing to identify their products as AI than during the current period of myriad AI technologies clogging the “peak of inflated expectations” in the Gartner Hype Cycle. [6]Failure is defined as “the nonperformance or inability of the system or component to perform its expected function for a specified time under specified environme...

  6. [6]

    fake news

    described a classification for computer risk factors (see table 1).Problem sources and examplesRequirements definition, omissions, mistakesSystem design, flawsHardware implementation, wiring, chip flawsSoftware implementation, program bugs, compiler bugsSystem use and operation, inadvertent mistakesWilful system misuseHardware, communication, or other equ...

  7. [7]

    A superintelligence might be highly resistant to decommissioning

    (p.154).3.2.4 A common taxonomy for computer system errors is the software development lifecycle stage (see table 5); it is often asserted that the cost of fixing an error at each stage is ten times the cost of fixing it in the previous stage (Dawson, Burrell, Rahim, & Brewster, 2010).Lifecycle StageCodeConceptLCDesignLDDevelopmentLETestingLTOperationLODe...

  8. [8]

    (CIP, AN, PT, LD). But these and other more fatal accidents with industrial robots going back at least to 1984 when an operator was killed by a 2,500 lb robot that came behind him with no warning (Fuller,

  9. [9]

    Flash Crash

    (CIP, CCF, AA, PS, LD).AI accidents may result in direct financial loss. The May 2010 “Flash Crash” resulted in the Dow Jones Industrial Average dropping about 9% for 36 minutes and resulted from program trading algorithms being inadequately prepared to deal with large volumes of strategically-placed trades which themselves were computer-mediated malice

  10. [10]

    Remediation efforts did not prevent more flash crashes in 2015 [17].A major concern in the application of AI is privacy

    (CIF, CCF, AA, AM, PD, LD, LT). Remediation efforts did not prevent more flash crashes in 2015 [17].A major concern in the application of AI is privacy. Consumer devices connected to corporate clouds of identity data come under scrutiny, especially when, for instance, an Amazon Alexa node recorded a private conversation and sent it to a random contact

  11. [11]

    Subject’s eyes are closed

    (CIE, CIF, CYC, AA, PD, LE). Just as human bias often results from inadequate exposure to diversity, AI bias often arises from the same cause. An attempt to use AI to objectively judge an online international beauty contest without human bias failed when only one of 44 winners it chose had dark skin, prompting speculation that this was due to the training...

  12. [12]

    social credit

    (CYS, AI, PT, LC). While this software is being used to create exactly its intended effect, we label this a failure because it has consequences many western observers would consider to be socially harmful. China has a “social credit” scoring system reminiscent of a Black Mirror episode (Wright, 2016), linked to social media and consumer systems such as Se...

  13. [13]

    and the attention paid by children in class [40], with the most attentive being rewarded (CIM, CYC, AI, PT, LC).In the West the dangers are more nascent. Researchers at the University of Pennsylvania demonstrated that textual analysis of an individual’s Facebook posts could predict 21 different medical conditions such as diabetes (Merchant, Asch, Crutchle...

  14. [14]

    An AI designed to do X will eventually fail to do X,

    [47]. Companies exploit human psychology to get our attention [48], the US military studies how to influence Twitter users [49], and the Pentagon wants to predict protests against the US President via social media surveillance [50].As Yampolskiy (2016) pointed out, “An AI designed to do X will eventually fail to do X,” codified as the Fundamental Theorem ...

  15. [15]

    Call me an ambulance

    the parallels with HAL 9000 of 2001: A Space Odyssey were so irresistible as to obscure the real risks of a computer failure in a critical environment. Apple’s Siri’s initial response to the request “Call me an ambulance” was to refer to the user thereafter as “ambulance”

  16. [16]

    properly

    that they check every category of failure classification, suggesting a path towards unbounded risk. They can exploit misfeatures or bugs in their environment, such as when in the developmental stages of the NERO video game, players’ robots evolved a wiggling motion that allowed them to walk up walls rather than solve the obstacles “properly” by walking a...

  17. [17]

    With Folded Hands

    (CIP AM, PD, LE).Most shows that explore AI failure develop a theme epitomized by Terminator series: a massive AI becomes self-aware and attempts to destroy humanity. (CIP, CIE, CIF, CCF, CYF, CYS, CYC, AN, AI, PD, LC, LO). Variations include Colossus: The Forbin Project, where the AI imprisons humanity to end conflict (CIM, CIE, CYS, CYC, AN, AI, PD, LC,...

  18. [18]

    Differential privacy

    and Maas’ application to AI: “At their extreme, unexpected interactions between competing systems, especially in cyberspace, could cause unexpected escalation—a ‘flash war’, analogous to the algorithmic flash crashes observed in the financial sector.” (Maas, 2018)5. ResponsesThere are various responses to these failures and risks. Several address privacy....

  19. [19]

    The surprising creativity of digital evolution: A col- lection of anecdotes from the evolutionary computation and artificial life research communities

    ConclusionsWhile we have not made recommendations as to how to address AI failures in each category of the dimensions we have presented, we hope that this classification scheme will make the development of remediation approaches easier.The importance of this effort may be extrapolated from Leveson’s observation that “The design of the automated system may...