pith. sign in

arxiv: 2604.21412 · v2 · submitted 2026-04-23 · 💻 cs.CY

A pragmatic classification framework for AI incident monitoring

Pith reviewed 2026-05-08 13:54 UTC · model grok-4.3

classification 💻 cs.CY
keywords AI incidentsincident monitoringAI governanceharm trendsexposure trendsclassification frameworkpublic databasesLLM filtering
0
0 comments X

The pith

A framework separates harm trends from exposure trends in AI incident data to produce governance categories like Escalating or Receding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a practical method for tracking AI incidents over time without letting raw counts mislead about safety. It breaks the problem into three parts: a clear monitoring question, separate estimates of harm frequency per use and how widely systems are deployed, and a mapping of those estimates into five actionable categories. This approach uses public incident databases plus automated filtering to work around missing data and reporting biases. A reader would care because simple tallies of incidents cannot tell whether risks are growing, shrinking, or simply reflecting more widespread use.

Core claim

The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable).

What carries the argument

The tiered estimation process that separately derives harm and exposure trends from public databases combined with the classification scheme that assigns the trends to governance categories.

Load-bearing premise

Public incident databases plus LLM-assisted filtering can produce reliable, calibratable estimates of separate harm and exposure trends despite reporting biases and data gaps.

What would settle it

Applying the framework to a specific AI domain yields harm and exposure trend estimates that cannot be calibrated against any external check or that produce unstable category assignments across slight changes in data filtering.

Figures

Figures reproduced from arXiv: 2604.21412 by Branwen Owen, Charlie Collins, Isaak Mengesha, Peter Slattery, Sean McGregor, Simon Mylius, Tina Wong.

Figure 1
Figure 1. Figure 1: The interpretive pipeline for AI incidents. At top: rising counts of monthly AI incidents and hazards (defined here as “potential dangers’) reported in the OECD AI Incidents Monitor conflate trends in media attention, deployment growth and the frequency of harm per use of AI systems (OECD, 2024a). Below: our framework takes recorded incidents as inputs to structured monitoring questions, then estimates har… view at source ↗
Figure 2
Figure 2. Figure 2: The SORT framework. Structured questions provide a solution to the problem of ill defined incident types more broadly. It decomposes a monitoring question into four explicit components. 3 view at source ↗
Figure 3
Figure 3. Figure 3: Estimation procedure. When incident reports and proxy measures agree in direction, a trend claim follows directly. When they diverge, the confidence interval (CI) either grows (increasing uncertainty) or shrinks (converging bounds but ambiguous direc￾tion), both requiring expert elicitation for a trend determination. When data is insufficient for either source, the procedure abstains. Estimating harm. In t… view at source ↗
Figure 4
Figure 4. Figure 4: Trajectory classification. Each monitoring question is placed in one of four categories based on the directional trends of harm-rate and exposure. Monitoring questions for which either trend could not be determined during estimation (§2.1) are labelled unclassifiable and do not enter the grid. The fill intensity encodes governance urgency. The framework prioritises transparency over precision. Di￾rectional… view at source ↗
Figure 5
Figure 5. Figure 5 view at source ↗
read the original abstract

Incident monitoring can drive safety improvements in high-reliability industries and population-scale technologies, but remains underdeveloped in AI governance. Public databases catalog thousands of AI incidents, but simple incident counts conflate media reporting propensity, system deployment ("exposure"), and harm frequency per unit exposure. We propose a methodological framework that accounts for these factors and calibrates confidence to available evidence in analyzing how AI incidents change over time. The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable). Through case studies, we examine the framework's clarifying power and limitations, demonstrate governance insight despite real-world data constraints, and provide a proof of concept for AI incident monitoring as a practical governance tool.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a three-component methodological framework for AI incident monitoring to move beyond raw incident counts that conflate reporting propensity, exposure, and harm. The components are: (1) a structured monitoring question defining analysis scope; (2) a tiered estimation process deriving separate harm and exposure trends, including via LLM-assisted filtering of public incident databases; and (3) a classification scheme mapping those trends to one of five actionable governance categories (Escalating, Mitigating, Concentrating, Receding, or Unclassifiable). Case studies are presented to illustrate the framework's clarifying power, limitations, and proof-of-concept value for governance despite real-world data constraints.

Significance. If the tiered estimation can be shown to yield reliable trend separation, the framework would provide a pragmatic advance for AI governance by enabling evidence-calibrated categorization of incident trends rather than unadjusted counts. It draws on external public databases without internal circularity, offers a forward methodological contribution, and the case studies demonstrate potential utility in handling acknowledged biases. This could support more targeted policy responses if the mapping to categories proves robust.

major comments (2)
  1. [Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.
  2. [Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.
minor comments (2)
  1. [Abstract] The abstract could specify the number and selection criteria for the case studies, as well as the exact public databases used, to strengthen the proof-of-concept claim.
  2. [Classification scheme] Notation for the five categories (Escalating, Mitigating, etc.) is clear but would benefit from an explicit table or diagram showing the decision rules mapping trend estimates to each label.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the framework's potential to support more targeted AI governance. We address each major comment below with specific plans for revision where the manuscript can be strengthened, while being transparent about inherent limitations of public data.

read point-by-point responses
  1. Referee: [Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.

    Authors: We agree that the absence of ground-truth validation and quantitative error bounds is a substantive limitation, as no independent verified dataset exists to benchmark harm versus exposure separation in public AI incident records. The case studies are presented as illustrative applications rather than statistical tests. In the revised manuscript we will add an expanded limitations section that explicitly discusses error propagation pathways and includes a sensitivity analysis varying LLM filtering thresholds and classification decision rules. This will quantify how category assignments shift under different assumptions, providing readers with practical bounds even without external ground truth. revision: partial

  2. Referee: [Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.

    Authors: The tiered process uses structured questions and source-consistency checks to assign confidence levels and derive separate trends, with exposure proxies drawn from deployment statistics and media coverage volume. We acknowledge that the current text leaves the precise heuristics and bias-handling steps implicit. In revision we will insert a dedicated subsection that enumerates the calibration rules, the specific bias-correction steps applied to incomplete exposure data, and the decision criteria used to prevent conflation before mapping to governance categories. This will make the pathway to actionable insight fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed methodological framework

full rationale

The paper introduces a three-component framework (structured monitoring question, tiered estimation process with LLM-assisted filtering of external public databases, and classification to governance categories) as a pragmatic tool for AI incident monitoring. This is a definitional and procedural proposal rather than a derivation chain involving equations, fitted parameters, or predictions. No steps reduce outputs to inputs by construction, no self-citations are load-bearing for central claims, and case studies function as illustrative examples without statistical forcing or self-referential loops. The approach relies on external data sources and acknowledged real-world constraints, remaining self-contained against benchmarks outside the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about the usability of public databases and LLM filtering for trend separation, plus newly defined classification categories; no explicit free parameters or external invented physical entities are introduced.

axioms (2)
  • domain assumption Public incident databases contain information sufficient to derive separate harm and exposure trends when appropriately filtered and calibrated.
    Central to the tiered estimation process described in the abstract.
  • domain assumption LLM-assisted filtering can be applied in a way that calibrates confidence to available evidence without introducing uncontrolled bias.
    Explicitly included as part of the estimation component.
invented entities (1)
  • Five governance classification categories (Escalating, Mitigating, Concentrating, Receding, Unclassifiable) no independent evidence
    purpose: To translate estimated harm and exposure trends into actionable categories for AI governance decisions.
    These categories are defined by the framework as the output mapping scheme.

pith-pipeline@v0.9.0 · 5475 in / 1377 out tokens · 75369 ms · 2026-05-08T13:54:20.602352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

  1. [1]

    Accessed: 2026-04-21

    URL https://fatjoe.com/ . Accessed: 2026-04-21. Jeanmaire, C. and Boger, S. AI Incidents Are Rising. It’s Time for the United States to Build Playbooks for When AI Fails. The Future Society, November

  2. [2]

    Accessed: 2026- 04-15

    URL https://thefuturesociety.org/ us-ai-incident-response/ . Accessed: 2026- 04-15. 8 Classification of AI incident trajectories McCarty, D. J., Tull, E. S., Moy, C. S., Kwoh, C. K., and LaPorte, R. E. Ascertainment corrected rates: Ap- plications of capture-recapture methods.International Journal of Epidemiology, 22(3):559–565, 1993. doi: 10.1093/ije/22....

  3. [3]

    Perrow, C

    doi: 10.1609/aaai.v39i28.35163. Perrow, C. Normal accidents: Living with high risk technologies-updated edition.Princeton university press, 2011. Rao, A., Keller, D., Kalra, N., Steed, R., Kwegyir-Aggrey, K., Klyman, K., Staheli, D., and Bergman, S. Challenges to the monitoring of deployed ai systems: Center for ai standards and innovation.NIST, 2026. doi...

  4. [4]

    Richardson, W

    doi: 10.1145/3757887.3763018. Richardson, W. S., Wilson, M. C., Nishikawa, J., and Hay- ward, R. S. The well-built clinical question: A key to evidence-based decisions.ACP Journal Club, 123(3): A12–A13, 1995. Sidoti, O. and McClain, C. 34% of U.S. adults have used ChatGPT, about double the share in 2023. Pew Research Center, June 2025. URL https://www. pe...

  5. [5]

    industry and commercial

    Due to pervasive underreporting by companies, data is insufficient to identify the true incident count and severity. We therefore use the proxy of the count of security incidents and data breaches for large financial organizations reported in the annual Verizon Data Breach Investigations Report. In 2023, there were 122 security incidents and 87 data breac...