A pragmatic classification framework for AI incident monitoring

Branwen Owen; Charlie Collins; Isaak Mengesha; Peter Slattery; Sean McGregor; Simon Mylius; Tina Wong

arxiv: 2604.21412 · v2 · submitted 2026-04-23 · 💻 cs.CY

A pragmatic classification framework for AI incident monitoring

Isaak Mengesha , Branwen Owen , Charlie Collins , Tina Wong , Simon Mylius , Peter Slattery , Sean McGregor This is my paper

Pith reviewed 2026-05-08 13:54 UTC · model grok-4.3

classification 💻 cs.CY

keywords AI incidentsincident monitoringAI governanceharm trendsexposure trendsclassification frameworkpublic databasesLLM filtering

0 comments

The pith

A framework separates harm trends from exposure trends in AI incident data to produce governance categories like Escalating or Receding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a practical method for tracking AI incidents over time without letting raw counts mislead about safety. It breaks the problem into three parts: a clear monitoring question, separate estimates of harm frequency per use and how widely systems are deployed, and a mapping of those estimates into five actionable categories. This approach uses public incident databases plus automated filtering to work around missing data and reporting biases. A reader would care because simple tallies of incidents cannot tell whether risks are growing, shrinking, or simply reflecting more widespread use.

Core claim

The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable).

What carries the argument

The tiered estimation process that separately derives harm and exposure trends from public databases combined with the classification scheme that assigns the trends to governance categories.

Load-bearing premise

Public incident databases plus LLM-assisted filtering can produce reliable, calibratable estimates of separate harm and exposure trends despite reporting biases and data gaps.

What would settle it

Applying the framework to a specific AI domain yields harm and exposure trend estimates that cannot be calibrated against any external check or that produce unstable category assignments across slight changes in data filtering.

Figures

Figures reproduced from arXiv: 2604.21412 by Branwen Owen, Charlie Collins, Isaak Mengesha, Peter Slattery, Sean McGregor, Simon Mylius, Tina Wong.

**Figure 1.** Figure 1: The interpretive pipeline for AI incidents. At top: rising counts of monthly AI incidents and hazards (defined here as “potential dangers’) reported in the OECD AI Incidents Monitor conflate trends in media attention, deployment growth and the frequency of harm per use of AI systems (OECD, 2024a). Below: our framework takes recorded incidents as inputs to structured monitoring questions, then estimates har… view at source ↗

**Figure 2.** Figure 2: The SORT framework. Structured questions provide a solution to the problem of ill defined incident types more broadly. It decomposes a monitoring question into four explicit components. 3 view at source ↗

**Figure 3.** Figure 3: Estimation procedure. When incident reports and proxy measures agree in direction, a trend claim follows directly. When they diverge, the confidence interval (CI) either grows (increasing uncertainty) or shrinks (converging bounds but ambiguous direction), both requiring expert elicitation for a trend determination. When data is insufficient for either source, the procedure abstains. Estimating harm. In t… view at source ↗

**Figure 4.** Figure 4: Trajectory classification. Each monitoring question is placed in one of four categories based on the directional trends of harm-rate and exposure. Monitoring questions for which either trend could not be determined during estimation (§2.1) are labelled unclassifiable and do not enter the grid. The fill intensity encodes governance urgency. The framework prioritises transparency over precision. Directional… view at source ↗

read the original abstract

Incident monitoring can drive safety improvements in high-reliability industries and population-scale technologies, but remains underdeveloped in AI governance. Public databases catalog thousands of AI incidents, but simple incident counts conflate media reporting propensity, system deployment ("exposure"), and harm frequency per unit exposure. We propose a methodological framework that accounts for these factors and calibrates confidence to available evidence in analyzing how AI incidents change over time. The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable). Through case studies, we examine the framework's clarifying power and limitations, demonstrate governance insight despite real-world data constraints, and provide a proof of concept for AI incident monitoring as a practical governance tool.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a useful structure for turning AI incident data into governance categories but the trend separation needs more validation to be convincing.

read the letter

The paper proposes a framework to classify AI incidents into governance categories by separating harm and exposure trends from public databases. That's the core idea, and it fills a gap where people usually just tally incidents without adjusting for biases. What is new is the three-part setup: a clear monitoring question, a tiered process for estimating trends separately including LLM-assisted filtering, and then assigning to one of five categories like Escalating or Receding. The case studies try to apply this to real data and show it can produce useful signals even with incomplete information. That practical angle is where it does well, giving a template others could follow or adapt. The soft spots are in the estimation step. Public incident reports are skewed by media attention and reporting practices, and while LLM filtering helps sort them, the paper needs to show how errors there affect the final trend estimates and category assignments. Without ground-truth comparisons or tests for how biases propagate, the separation of harm and exposure might not be as reliable as claimed. The abstract notes data constraints, so the case studies should address that directly. This is aimed at AI safety researchers and governance practitioners who work with incident monitoring. Anyone looking for a structured way to derive trends rather than raw counts would find it relevant. It deserves a serious referee because the framework is clearly laid out and the problem it tackles is important, though revisions would likely focus on strengthening the validation parts. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a three-component methodological framework for AI incident monitoring to move beyond raw incident counts that conflate reporting propensity, exposure, and harm. The components are: (1) a structured monitoring question defining analysis scope; (2) a tiered estimation process deriving separate harm and exposure trends, including via LLM-assisted filtering of public incident databases; and (3) a classification scheme mapping those trends to one of five actionable governance categories (Escalating, Mitigating, Concentrating, Receding, or Unclassifiable). Case studies are presented to illustrate the framework's clarifying power, limitations, and proof-of-concept value for governance despite real-world data constraints.

Significance. If the tiered estimation can be shown to yield reliable trend separation, the framework would provide a pragmatic advance for AI governance by enabling evidence-calibrated categorization of incident trends rather than unadjusted counts. It draws on external public databases without internal circularity, offers a forward methodological contribution, and the case studies demonstrate potential utility in handling acknowledged biases. This could support more targeted policy responses if the mapping to categories proves robust.

major comments (2)

[Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.
[Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.

minor comments (2)

[Abstract] The abstract could specify the number and selection criteria for the case studies, as well as the exact public databases used, to strengthen the proof-of-concept claim.
[Classification scheme] Notation for the five categories (Escalating, Mitigating, etc.) is clear but would benefit from an explicit table or diagram showing the decision rules mapping trend estimates to each label.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the framework's potential to support more targeted AI governance. We address each major comment below with specific plans for revision where the manuscript can be strengthened, while being transparent about inherent limitations of public data.

read point-by-point responses

Referee: [Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.

Authors: We agree that the absence of ground-truth validation and quantitative error bounds is a substantive limitation, as no independent verified dataset exists to benchmark harm versus exposure separation in public AI incident records. The case studies are presented as illustrative applications rather than statistical tests. In the revised manuscript we will add an expanded limitations section that explicitly discusses error propagation pathways and includes a sensitivity analysis varying LLM filtering thresholds and classification decision rules. This will quantify how category assignments shift under different assumptions, providing readers with practical bounds even without external ground truth. revision: partial
Referee: [Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.

Authors: The tiered process uses structured questions and source-consistency checks to assign confidence levels and derive separate trends, with exposure proxies drawn from deployment statistics and media coverage volume. We acknowledge that the current text leaves the precise heuristics and bias-handling steps implicit. In revision we will insert a dedicated subsection that enumerates the calibration rules, the specific bias-correction steps applied to incomplete exposure data, and the decision criteria used to prevent conflation before mapping to governance categories. This will make the pathway to actionable insight fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed methodological framework

full rationale

The paper introduces a three-component framework (structured monitoring question, tiered estimation process with LLM-assisted filtering of external public databases, and classification to governance categories) as a pragmatic tool for AI incident monitoring. This is a definitional and procedural proposal rather than a derivation chain involving equations, fitted parameters, or predictions. No steps reduce outputs to inputs by construction, no self-citations are load-bearing for central claims, and case studies function as illustrative examples without statistical forcing or self-referential loops. The approach relies on external data sources and acknowledged real-world constraints, remaining self-contained against benchmarks outside the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about the usability of public databases and LLM filtering for trend separation, plus newly defined classification categories; no explicit free parameters or external invented physical entities are introduced.

axioms (2)

domain assumption Public incident databases contain information sufficient to derive separate harm and exposure trends when appropriately filtered and calibrated.
Central to the tiered estimation process described in the abstract.
domain assumption LLM-assisted filtering can be applied in a way that calibrates confidence to available evidence without introducing uncontrolled bias.
Explicitly included as part of the estimation component.

invented entities (1)

Five governance classification categories (Escalating, Mitigating, Concentrating, Receding, Unclassifiable) no independent evidence
purpose: To translate estimated harm and exposure trends into actionable categories for AI governance decisions.
These categories are defined by the framework as the output mapping scheme.

pith-pipeline@v0.9.0 · 5475 in / 1377 out tokens · 75369 ms · 2026-05-08T13:54:20.602352+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Accessed: 2026-04-21

URL https://fatjoe.com/ . Accessed: 2026-04-21. Jeanmaire, C. and Boger, S. AI Incidents Are Rising. It’s Time for the United States to Build Playbooks for When AI Fails. The Future Society, November

work page 2026
[2]

Accessed: 2026- 04-15

URL https://thefuturesociety.org/ us-ai-incident-response/ . Accessed: 2026- 04-15. 8 Classification of AI incident trajectories McCarty, D. J., Tull, E. S., Moy, C. S., Kwoh, C. K., and LaPorte, R. E. Ascertainment corrected rates: Ap- plications of capture-recapture methods.International Journal of Epidemiology, 22(3):559–565, 1993. doi: 10.1093/ije/22....

work page doi:10.1093/ije/22.3.559 2026
[3]

Perrow, C

doi: 10.1609/aaai.v39i28.35163. Perrow, C. Normal accidents: Living with high risk technologies-updated edition.Princeton university press, 2011. Rao, A., Keller, D., Kalra, N., Steed, R., Kwegyir-Aggrey, K., Klyman, K., Staheli, D., and Bergman, S. Challenges to the monitoring of deployed ai systems: Center for ai standards and innovation.NIST, 2026. doi...

work page doi:10.1609/aaai.v39i28.35163 2011
[4]

Richardson, W

doi: 10.1145/3757887.3763018. Richardson, W. S., Wilson, M. C., Nishikawa, J., and Hay- ward, R. S. The well-built clinical question: A key to evidence-based decisions.ACP Journal Club, 123(3): A12–A13, 1995. Sidoti, O. and McClain, C. 34% of U.S. adults have used ChatGPT, about double the share in 2023. Pew Research Center, June 2025. URL https://www. pe...

work page doi:10.1145/3757887.3763018 1995
[5]

industry and commercial

Due to pervasive underreporting by companies, data is insufficient to identify the true incident count and severity. We therefore use the proxy of the count of security incidents and data breaches for large financial organizations reported in the annual Verizon Data Breach Investigations Report. In 2023, there were 122 security incidents and 87 data breac...

work page 2023

[1] [1]

Accessed: 2026-04-21

URL https://fatjoe.com/ . Accessed: 2026-04-21. Jeanmaire, C. and Boger, S. AI Incidents Are Rising. It’s Time for the United States to Build Playbooks for When AI Fails. The Future Society, November

work page 2026

[2] [2]

Accessed: 2026- 04-15

URL https://thefuturesociety.org/ us-ai-incident-response/ . Accessed: 2026- 04-15. 8 Classification of AI incident trajectories McCarty, D. J., Tull, E. S., Moy, C. S., Kwoh, C. K., and LaPorte, R. E. Ascertainment corrected rates: Ap- plications of capture-recapture methods.International Journal of Epidemiology, 22(3):559–565, 1993. doi: 10.1093/ije/22....

work page doi:10.1093/ije/22.3.559 2026

[3] [3]

Perrow, C

doi: 10.1609/aaai.v39i28.35163. Perrow, C. Normal accidents: Living with high risk technologies-updated edition.Princeton university press, 2011. Rao, A., Keller, D., Kalra, N., Steed, R., Kwegyir-Aggrey, K., Klyman, K., Staheli, D., and Bergman, S. Challenges to the monitoring of deployed ai systems: Center for ai standards and innovation.NIST, 2026. doi...

work page doi:10.1609/aaai.v39i28.35163 2011

[4] [4]

Richardson, W

doi: 10.1145/3757887.3763018. Richardson, W. S., Wilson, M. C., Nishikawa, J., and Hay- ward, R. S. The well-built clinical question: A key to evidence-based decisions.ACP Journal Club, 123(3): A12–A13, 1995. Sidoti, O. and McClain, C. 34% of U.S. adults have used ChatGPT, about double the share in 2023. Pew Research Center, June 2025. URL https://www. pe...

work page doi:10.1145/3757887.3763018 1995

[5] [5]

industry and commercial

Due to pervasive underreporting by companies, data is insufficient to identify the true incident count and severity. We therefore use the proxy of the count of security incidents and data breaches for large financial organizations reported in the annual Verizon Data Breach Investigations Report. In 2023, there were 122 security incidents and 87 data breac...

work page 2023