A pragmatic classification framework for AI incident monitoring
Pith reviewed 2026-05-08 13:54 UTC · model grok-4.3
The pith
A framework separates harm trends from exposure trends in AI incident data to produce governance categories like Escalating or Receding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable).
What carries the argument
The tiered estimation process that separately derives harm and exposure trends from public databases combined with the classification scheme that assigns the trends to governance categories.
Load-bearing premise
Public incident databases plus LLM-assisted filtering can produce reliable, calibratable estimates of separate harm and exposure trends despite reporting biases and data gaps.
What would settle it
Applying the framework to a specific AI domain yields harm and exposure trend estimates that cannot be calibrated against any external check or that produce unstable category assignments across slight changes in data filtering.
Figures
read the original abstract
Incident monitoring can drive safety improvements in high-reliability industries and population-scale technologies, but remains underdeveloped in AI governance. Public databases catalog thousands of AI incidents, but simple incident counts conflate media reporting propensity, system deployment ("exposure"), and harm frequency per unit exposure. We propose a methodological framework that accounts for these factors and calibrates confidence to available evidence in analyzing how AI incidents change over time. The framework comprises three components: a structured monitoring question that defines the scope of the analysis; a tiered estimation process that separately derives harm and exposure trends, including through LLM-assisted filtering of public incident databases; and a classification scheme that maps the resulting trend estimates onto actionable governance categories (Escalating, Mitigating, Concentrating, Receding or Unclassifiable). Through case studies, we examine the framework's clarifying power and limitations, demonstrate governance insight despite real-world data constraints, and provide a proof of concept for AI incident monitoring as a practical governance tool.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a three-component methodological framework for AI incident monitoring to move beyond raw incident counts that conflate reporting propensity, exposure, and harm. The components are: (1) a structured monitoring question defining analysis scope; (2) a tiered estimation process deriving separate harm and exposure trends, including via LLM-assisted filtering of public incident databases; and (3) a classification scheme mapping those trends to one of five actionable governance categories (Escalating, Mitigating, Concentrating, Receding, or Unclassifiable). Case studies are presented to illustrate the framework's clarifying power, limitations, and proof-of-concept value for governance despite real-world data constraints.
Significance. If the tiered estimation can be shown to yield reliable trend separation, the framework would provide a pragmatic advance for AI governance by enabling evidence-calibrated categorization of incident trends rather than unadjusted counts. It draws on external public databases without internal circularity, offers a forward methodological contribution, and the case studies demonstrate potential utility in handling acknowledged biases. This could support more targeted policy responses if the mapping to categories proves robust.
major comments (2)
- [Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.
- [Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.
minor comments (2)
- [Abstract] The abstract could specify the number and selection criteria for the case studies, as well as the exact public databases used, to strengthen the proof-of-concept claim.
- [Classification scheme] Notation for the five categories (Escalating, Mitigating, etc.) is clear but would benefit from an explicit table or diagram showing the decision rules mapping trend estimates to each label.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the framework's potential to support more targeted AI governance. We address each major comment below with specific plans for revision where the manuscript can be strengthened, while being transparent about inherent limitations of public data.
read point-by-point responses
-
Referee: [Abstract and case studies] Abstract and case studies description: The central claim that the tiered estimation process (including LLM-assisted filtering) produces calibratable, separate harm and exposure trend estimates sufficient for reliable classification into the five governance categories is load-bearing. However, the manuscript provides no ground-truth validation, quantitative error bounds on LLM filtering, or sensitivity analysis showing how reporting biases and classification errors propagate into final category assignments (Escalating etc.), leaving the separation step untested despite the acknowledged data constraints.
Authors: We agree that the absence of ground-truth validation and quantitative error bounds is a substantive limitation, as no independent verified dataset exists to benchmark harm versus exposure separation in public AI incident records. The case studies are presented as illustrative applications rather than statistical tests. In the revised manuscript we will add an expanded limitations section that explicitly discusses error propagation pathways and includes a sensitivity analysis varying LLM filtering thresholds and classification decision rules. This will quantify how category assignments shift under different assumptions, providing readers with practical bounds even without external ground truth. revision: partial
-
Referee: [Tiered estimation process] Tiered estimation process: The framework states that it 'calibrates confidence to available evidence' while separately deriving harm and exposure trends, but the description does not detail concrete calibration methods, bias-correction techniques beyond LLM filtering, or how incomplete exposure data is handled to avoid conflation. This directly affects whether the classification scheme can deliver actionable governance insight.
Authors: The tiered process uses structured questions and source-consistency checks to assign confidence levels and derive separate trends, with exposure proxies drawn from deployment statistics and media coverage volume. We acknowledge that the current text leaves the precise heuristics and bias-handling steps implicit. In revision we will insert a dedicated subsection that enumerates the calibration rules, the specific bias-correction steps applied to incomplete exposure data, and the decision criteria used to prevent conflation before mapping to governance categories. This will make the pathway to actionable insight fully transparent. revision: yes
Circularity Check
No significant circularity in the proposed methodological framework
full rationale
The paper introduces a three-component framework (structured monitoring question, tiered estimation process with LLM-assisted filtering of external public databases, and classification to governance categories) as a pragmatic tool for AI incident monitoring. This is a definitional and procedural proposal rather than a derivation chain involving equations, fitted parameters, or predictions. No steps reduce outputs to inputs by construction, no self-citations are load-bearing for central claims, and case studies function as illustrative examples without statistical forcing or self-referential loops. The approach relies on external data sources and acknowledged real-world constraints, remaining self-contained against benchmarks outside the paper itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Public incident databases contain information sufficient to derive separate harm and exposure trends when appropriately filtered and calibrated.
- domain assumption LLM-assisted filtering can be applied in a way that calibrates confidence to available evidence without introducing uncontrolled bias.
invented entities (1)
-
Five governance classification categories (Escalating, Mitigating, Concentrating, Receding, Unclassifiable)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
URL https://fatjoe.com/ . Accessed: 2026-04-21. Jeanmaire, C. and Boger, S. AI Incidents Are Rising. It’s Time for the United States to Build Playbooks for When AI Fails. The Future Society, November
work page 2026
-
[2]
URL https://thefuturesociety.org/ us-ai-incident-response/ . Accessed: 2026- 04-15. 8 Classification of AI incident trajectories McCarty, D. J., Tull, E. S., Moy, C. S., Kwoh, C. K., and LaPorte, R. E. Ascertainment corrected rates: Ap- plications of capture-recapture methods.International Journal of Epidemiology, 22(3):559–565, 1993. doi: 10.1093/ije/22....
-
[3]
doi: 10.1609/aaai.v39i28.35163. Perrow, C. Normal accidents: Living with high risk technologies-updated edition.Princeton university press, 2011. Rao, A., Keller, D., Kalra, N., Steed, R., Kwegyir-Aggrey, K., Klyman, K., Staheli, D., and Bergman, S. Challenges to the monitoring of deployed ai systems: Center for ai standards and innovation.NIST, 2026. doi...
-
[4]
doi: 10.1145/3757887.3763018. Richardson, W. S., Wilson, M. C., Nishikawa, J., and Hay- ward, R. S. The well-built clinical question: A key to evidence-based decisions.ACP Journal Club, 123(3): A12–A13, 1995. Sidoti, O. and McClain, C. 34% of U.S. adults have used ChatGPT, about double the share in 2023. Pew Research Center, June 2025. URL https://www. pe...
-
[5]
Due to pervasive underreporting by companies, data is insufficient to identify the true incident count and severity. We therefore use the proxy of the count of security incidents and data breaches for large financial organizations reported in the annual Verizon Data Breach Investigations Report. In 2023, there were 122 security incidents and 87 data breac...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.