Designing escalation criteria for international AI incident response: criteria, triggers, and thresholds

Caio Machado; Francesca Gomez; Josephine Schwab; Lydia Preston; Matthew Ball; Michael Harre

arxiv: 2604.23183 · v2 · pith:KPJDNIQGnew · submitted 2026-04-25 · 💻 cs.CY · cs.AI

Designing escalation criteria for international AI incident response: criteria, triggers, and thresholds

Francesca Gomez , Matthew Ball , Michael Harre , Lydia Preston , Josephine Schwab , Caio Machado This is my paper

Pith reviewed 2026-05-21 00:34 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords AI incident responseescalation criteriainternational coordinationAI regulationEU AI Actincident detectionunder-detection patternsregulatory frameworks

0 comments

The pith

This paper proposes eight criteria to decide when an AI incident should escalate from national handling to international coordination.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper fills the gap in operational guidance for escalating AI incidents amid new reporting rules in multiple jurisdictions. It derives eight criteria from reviews of SB 53, the EU AI Act, the GPAI Code of Practice, and other industry frameworks, then arranges them into a sequential flowchart with gated checks and thresholds. Testing the criteria against ten documented incidents and structured variants shows three recurring design patterns that produce under-detection when model developers control escalation. A reader would care because without shared criteria, responses may remain inconsistent or too slow to contain harms that cross borders. The framework is presented as a flexible reference that jurisdictions can adopt without changing their own legal approaches.

Core claim

The paper proposes an escalation framework of eight criteria for determining when a detected AI incident warrants international coordination. These criteria are organized into a flowchart with sequential decision gates and threshold checks, and each is mapped against existing regulatory texts to show where design choices aid or hinder detection. When applied to ten documented AI incidents and variants, the framework identifies three design patterns that lead to systematic under-detection in regimes assigning escalation responsibility to model developers: requiring confirmed harm before escalation, assessing incidents in isolation rather than as accumulating systems, and aligning thresholds 2

What carries the argument

The eight-criteria escalation framework structured as a sequential flowchart with gated decision points and threshold checks, which translates regulatory requirements into practical tests for international escalation.

If this is right

Incidents such as model weight exfiltration are detected only after severe irreversible harm has already spread.
Systemic harms that build from many small events risk being missed when each incident is judged alone.
Thresholds written in legal language rather than measurable terms become hard to apply under time pressure.
Escalation decisions depend on the underlying definitions of harm and the data available to the responsible actor, creating further sources of under-detection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulators could adapt the same gated-check structure to other emerging risks such as advanced biotechnology.
Collecting standardized incident data across borders would allow empirical refinement of the proposed thresholds.
The interdependency between definitions, data access, and thresholds suggests that fixing escalation rules alone will not solve under-detection.
International coordination bodies might treat the framework as an initial template for harmonized reporting standards.

Load-bearing premise

That the ten documented AI incidents and their structured variants are representative enough to reveal systematic under-detection patterns across regulatory regimes and that the criteria can be turned into workable thresholds without further empirical validation or new data.

What would settle it

Applying the eight criteria and flowchart to a fresh collection of AI incidents from multiple jurisdictions and observing whether the same three under-detection patterns appear or whether incidents are handled consistently at the right level without the framework.

Figures

Figures reproduced from arXiv: 2604.23183 by Caio Machado, Francesca Gomez, Josephine Schwab, Lydia Preston, Matthew Ball, Michael Harre.

**Figure 1.** Figure 1: Overall incident escalation flowchart. 29 view at source ↗

**Figure 2.** Figure 2: Criterion 4: Is the incident part of a broader pattern? Criterion 4: Is the incident part of a broader pattern? (correlated / related incidents) Purpose Criterion 4 assesses whether an incident is part of a broader pattern of correlated or related incidents. Systemic risk often emerges from the correlation between incidents rather than their individual severity: a series of individually sub-threshold incid… view at source ↗

**Figure 3.** Figure 3: Criterion 6 and 7: Is international coordination required to contain the incident or respond to its cross-border propagation or irreversible harm? 37 view at source ↗

**Figure 4.** Figure 4: Visual representation of key findings of the paper. Finding 1 reflects overarching dependencies for escalation framework. Other findings are grouped by i) thresholds and triggers for escalation; ii) definitions of incidents; and iii) access to data and monitoring. 45 view at source ↗

read the original abstract

AI incident reporting requirements are emerging in regulation and policy, yet no operational criteria exist for determining when a detected AI incident warrants escalation beyond national handling to international coordination. This paper proposes an escalation framework to address this gap, intended as a common reference point across jurisdictions that enables aligned escalation while preserving flexibility in how actors respond within their own legal and policy contexts. We review SB 53, the EU AI Act, the GPAI Code of Practice, and incident frameworks from other industries to derive eight criteria for assessing whether an incident warrants escalation, translated into a sequential flowchart with gated decision points and threshold checks. For each criterion, we map how it interplays with these regulatory frameworks, identifying where their design choices support or undermine effective detection. We test the framework against ten documented AI incidents and structured variants to identify where criteria under-detect or misclassify incidents in practice. We find three design patterns that may lead to systematic under-detection in regimes where model developers are responsible for escalation: a. where escalation requires confirmed harm, events such as model weight exfiltration risk detection only after severe, irreversible harm has propagated; b. where incidents are assessed individually, systemic harms emerging from accumulation risk being under-detected; and c. where thresholds align with legal instruments rather than quantitatively testable terms, criteria risk being impractical to apply under time pressure. We also find that escalation rules are only one component of a broader framework: the underlying definitions against which thresholds are set, and the data available to the responsible actor, create interdependencies that can themselves drive under-detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper gives a usable set of eight criteria and a gated flowchart for escalating AI incidents to international level, along with three patterns that could cause under-detection in developer-led rules. They pull the criteria from SB 53, the EU AI Act, the GPAI Code of Practice, and some cross-industry examples, then lay out how each one maps back to those sources and where the design choices help or hurt detection. The testing on ten documented incidents plus structured variants turns up the three patterns: needing confirmed harm first, handling incidents one at a time instead of looking for buildup, and tying thresholds to legal language rather than measurable terms. They also note that the rules depend on clear definitions and data access, which is a fair point. That operational synthesis and the explicit callouts on under-detection are the new pieces, even if the building blocks come from existing regulations. The flowchart is straightforward enough that someone drafting response procedures could pick it up and adapt it without starting from scratch. The main soft spot is the test set. Ten incidents without stated selection rules or details on how the variants were made makes it hard to treat the under-detection patterns as systematic across regimes. The thresholds stay qualitative too, so turning the criteria into something that works under time pressure would need more concrete numbers and checks. These gaps are real but not surprising in a policy proposal that is trying to fill an operational hole rather than run a full empirical study. This is for people working on AI governance, incident reporting rules, or cross-border coordination. A regulator or standards body looking for a starting reference would get something concrete to work with. I would send it to peer review. The core idea addresses a coordination gap that matters, and the engagement with actual regulations and cases is honest enough that referees could help tighten the testing side.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an escalation framework with eight criteria for determining when a detected AI incident warrants escalation from national to international coordination. Derived from a review of regulations including SB 53, the EU AI Act, and the GPAI Code of Practice, the criteria are organized into a sequential flowchart with gated decision points and threshold checks. The framework is tested against ten documented AI incidents and structured variants to identify three design patterns that may produce systematic under-detection when model developers are responsible for escalation: confirmed-harm requirements, individual assessments, and legal-instrument thresholds. The paper also emphasizes interdependencies with underlying incident definitions and available data.

Significance. If the central claims hold, the work offers a constructive synthesis that could serve as a reference point for aligning international AI incident response while preserving jurisdictional flexibility. The mapping of criteria to existing regulatory frameworks and the identification of specific design patterns that risk under-detection provide actionable insights for policymakers. The attention to how definitions and data availability interact with escalation rules strengthens the practical relevance of the contribution.

major comments (2)

[§5] §5 (Testing the framework against incidents): The manuscript does not specify selection criteria for the ten documented AI incidents or describe how the structured variants were constructed (e.g., which parameters were varied and on what basis). Because the claim that three design patterns produce systematic under-detection across regimes rests on this test set being representative, the absence of this methodology leaves the generalizability of the findings open to post-hoc selection concerns.
[§3] §3 (Derivation of the eight criteria): Exact operational definitions, triggers, and threshold quantifications for the criteria are not fully detailed, which directly affects the evaluation of their interplay with frameworks such as the EU AI Act and the practicality of the gated flowchart under time pressure. This gap is load-bearing for the paper's assertion that the criteria can be translated into usable checks.

minor comments (2)

[Abstract] The abstract states that variants were used to identify under-detection but provides no high-level indication of their construction; a single sentence clarifying their role would improve accessibility without altering length.
[Figure 1] Figure 1 (flowchart) would benefit from explicit labeling of which criteria correspond to each gated decision point to aid readers in tracing the three identified patterns.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the methodological transparency and operational precision of our escalation framework. We address each major comment below, indicating planned revisions to strengthen the manuscript while preserving its core contributions on design patterns and interdependencies.

read point-by-point responses

Referee: [§5] §5 (Testing the framework against incidents): The manuscript does not specify selection criteria for the ten documented AI incidents or describe how the structured variants were constructed (e.g., which parameters were varied and on what basis). Because the claim that three design patterns produce systematic under-detection across regimes rests on this test set being representative, the absence of this methodology leaves the generalizability of the findings open to post-hoc selection concerns.

Authors: We agree that explicit documentation of incident selection and variant construction is necessary to support claims of systematic under-detection. The ten incidents were drawn from publicly reported cases spanning 2022–2024 to illustrate diversity in failure modes (e.g., weight exfiltration, cumulative bias effects, and regulatory non-compliance), while structured variants were generated by varying parameters such as harm confirmation status, assessment granularity (individual vs. aggregate), and threshold alignment with legal instruments. To eliminate concerns about representativeness and post-hoc selection, we will add a dedicated subsection in §5 that lists the incidents with sources, states the inclusion criteria (coverage of regulatory domains, incident scale, and data availability), and details the parameter variations used for each structured variant, accompanied by a summary table. revision: yes
Referee: [§3] §3 (Derivation of the eight criteria): Exact operational definitions, triggers, and threshold quantifications for the criteria are not fully detailed, which directly affects the evaluation of their interplay with frameworks such as the EU AI Act and the practicality of the gated flowchart under time pressure. This gap is load-bearing for the paper's assertion that the criteria can be translated into usable checks.

Authors: The manuscript derives the eight criteria from the reviewed instruments (SB 53, EU AI Act, GPAI Code) and maps their interplay, but we acknowledge that more granular operational definitions, concrete triggers, and example quantifications would improve evaluability and practical applicability. We will expand §3 with a table that provides, for each criterion, an operational definition, sample triggers drawn from the source regulations, and illustrative threshold quantifications (e.g., harm severity scales or temporal windows). This addition will also include a brief discussion of how the gated flowchart accommodates time pressure, directly addressing the referee’s concern about usability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation synthesizes external regulations and incidents

full rationale

The paper reviews independent external sources including SB 53, the EU AI Act, the GPAI Code of Practice, and incident frameworks from other industries to derive its eight criteria, then applies the resulting framework to ten documented AI incidents and structured variants. This constitutes a synthesis and testing process against outside materials rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations, ansatzes, or uniqueness theorems are invoked that reduce the central claims to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the assumption that existing regulatory texts and a small set of documented incidents provide an adequate empirical base for deriving generalizable criteria; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The reviewed regulations (SB 53, EU AI Act, GPAI Code of Practice) and incident frameworks from other industries supply sufficient material to derive eight operational escalation criteria.
Invoked in the derivation step described in the abstract.
domain assumption The ten documented AI incidents and structured variants are representative enough to identify systematic under-detection patterns.
Used to validate the framework and surface the three design patterns.

pith-pipeline@v0.9.0 · 5819 in / 1453 out tokens · 33679 ms · 2026-05-21T00:34:14.590431+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We test the framework against ten documented AI incidents and structured variants to identify three design patterns that may lead to systematic under-detection.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

International Atomic Energy Agency

URL https://internationalaisafetyreport.org/publication/2026-report-extended-summa ry-policymakers. International Atomic Energy Agency. Convention on Early Notification of a Nuclear Accident, September 1986. URL https://www.iaea.org/topics/nuclear-safety-conventions/convention-early-notification-n uclear-accident. Place: Vienna Publisher: International At...

work page doi:10.1111/jofi.12498 2026
[2]

suicide coach

URLhttps://thefuturesociety.org/aicrisisexplainer/. OECD. Stocktaking for the development of an AI incident definition. OECD Artificial Intelligence Papers 4, OECD Publishing, Paris, 2023. OECD. Defining AI Incidents and Related Terms. OECD Artificial Intelligence Papers 16, Organisation for Economic Co-operation and Development, 2024a. OECD. Defining AI ...

work page doi:10.1103/physrevlett.86.3200 2023
[3]

Henri Theil

Press release. Henri Theil. The Development of International Inequality 1960–1985.Journal of Econometrics, 42(1):145–155, 1989. UK Health and Safety Executive. Introduction to the seveso iii directive, 2025. URL https://www.hse.gov.uk/s eveso/introduction.htm. Updated 5 February 2025; accessed 2026-04-23. United Nations Office for Disaster Risk Reduction....

work page doi:10.1073/pnas.082090499 1960
[4]

Experiment,

arXiv:2406.07358 [cs]. Marty J. Wolf, Keith W. Miller, and Frances S. Grodzinsky. Why We Should Have Seen That Coming: Comments on Microsoft’s Tay “Experiment,” and Wider Implications.ACM SIGCAS Computers and Society, 47(3):54–64, 2017. doi: 10.1145/3144592.3144598. World Health Organization. Annex 2 of the International Health Regulations (2005), 2005. U...

work page doi:10.1145/3144592.3144598 2017
[5]

‘serious incidents resulting from the use of their AI systems, meaning incident or malfunctioning leading to

Was AI a causal factor? How is the causal role of AI in the incident established? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Provider must report where the AI system ‘directly or indirectly leads to’ any listed serious harm. ‘serious incidents resulting from the use of their AI systems, meaning inc...

work page 2024
[6]

Trade-off with escalation decision based on confidence level: escalating with low confidence may risk unnecessary disruption/ alarm if no actual AI causality, and may reduce confidence in the escalation process, but escalation delays could risk avoidable harm propagation

work page
[7]

For multi-agent environments, it may be hard to establish which AI systems are causally responsible for an incident, particularly if there is a lack of information about deployed AI systems/ agents and a lack of data on their shared interactions

work page
[8]

Is the incident in an excluded domain or assessment context? Some domains, such as government use or military, may be deemed out of scope. EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal EU AI Act: excludes AI systems placed on market or put into service for military, defense, or national security purpo...

work page 2022
[9]

Excluded domains such as military or national security may preclude preventative action or direct containment within this framework. While an incident originating in an excluded domain could in principle warrant international coordination for warning, irreversible harm, or information gaps, these functions would in practice be managed through existing nat...

work page
[10]

Dual-use AI systems may fall in and out of scope depending on deployment context (as in EU AI Act), creating classification ambiguity at the point of incident assessment. 67

work page
[11]

(b) Serious and irreversible disruption of the management or operation of critical infrastructure

Has an immediate escalation condition been met? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Only part (c) under the ‘Serious Incident’ definition relates to a condition for escalation which does not relate to an explicit measurable harm: Article 3(49) — ‘Serious Incident’ definition (a) Death of a p...

work page
[12]

As AI capabilities and deployment contexts evolve, the set of conditions warranting immediate escalation will need to be reviewed and updated

work page
[13]

The EU AI Act’s commitment to annual review of Article 5 prohibited practices (Article 112) provides a precedent for building this into the framework’s governance. 70

work page
[14]

AI incident variants

Is the incident part of a broader pattern? (correlated / related incidents) EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(65) — Systemic Risk definition Systemic risk must be capable of being ‘propagated at scale across the value chain’ implying that a pattern of related incidents across the...

work page 2024
[15]

Identifying patterns relating to a serious incident being assessed requires tracking and analytics of incidents to be done and for clustering patterns to have been defined

work page
[16]

The level at which monitoring is done has implications for the incident patterns which can be identified within and across AI model developers

work page
[17]

Pattern identification at the capability and contextual root cause levels requires cross-provider incident visibility that is unlikely to be achievable by individual providers acting alone

work page
[18]

The use of different taxonomies and classification approaches across providers and incident databases increases the difficulty of building a cross-provider picture of systemic risk and incident patterns

work page
[19]

Clustering necessitates tracking and analytics of incidents: the level at which monitoring is performed has direct implications for which incident patterns can be identified. 73 5a. Has harm occurred in a relevant category? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(49) — ‘Serious Inciden...

work page 2024
[20]

The GPAI Code of Practice Appendix 1.1 risk types and the Appendix 1.4 specified systemic risks operate at different levels of abstraction to the MIT harm taxonomy categories used at this criterion

work page
[21]

Some categories may require investigation or contextual evidence that is unavailable at the point of initial assessment

The MIT harm taxonomy was designed for retrospective incident classification, not real-time triage. Some categories may require investigation or contextual evidence that is unavailable at the point of initial assessment. 75 5b. Has harm crossed a relevant severity threshold? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other d...

work page 2024
[22]

Many harm severities in the EU AI Act are defined using legal terminology, which is harder to operationalize as testable thresholds or trigger conditions

work page
[23]

The graduated severity scale is intended to address this gap, but the boundary between Level 3 and Level 4 may require further specification

A gap exists between SB 53’s quantitative thresholds and the lower boundary of harm that may nonetheless warrant escalation. The graduated severity scale is intended to address this gap, but the boundary between Level 3 and Level 4 may require further specification

work page
[24]

It is not operationalized as a measurable assessment criterion for individual incidents for escalation

Is international coordination required to contain the incident or respond to its cross-border propagation? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Propagation is treated as a defining characteristic of systemic risk: propagation capacity is what makes a risk systemic. It is not operationalized a...

work page 2024
[25]

Existing AI governance frameworks do not account for incident-level propagation assessment

work page
[26]

Effective incident-level propagation assessment depends on the ability to observe and correlate incidents across providers, deployment contexts, and jurisdictions in a timely manner

work page
[27]

For capability and emergent propagation, quantitative estimation of exposure scope is substantially harder

For supply-chain propagation, exposure scope can often be estimated by tracing the technical dependency tree. For capability and emergent propagation, quantitative estimation of exposure scope is substantially harder. 79

work page
[28]

Article 3(49)(b) — Serious Incident definition Irreversibility is explicitly named as a qualifying criterion for critical infrastructure disruption

Does irreversible harm require international coordination to assess or respond to its cross-border consequences? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Irreversibility is explicitly named as a qualifying criterion for one harm category and is a defining characteristic of loss of control risk un...

work page 2024
[29]

Where full reversal is technically possible but at infeasible cost or timescale, this is treated as effectively irreversible

Any harm impacts which cannot be reversed are defined as D. Where full reversal is technically possible but at infeasible cost or timescale, this is treated as effectively irreversible. If D = 0→No escalation required. If D > 0→proceed to Step 2. Step 2. Does the irreversible harm have consequences extending beyond the affected jurisdiction? If yes→Escala...

work page
[30]

Estimating D at the point of triage is inherently uncertain

work page
[31]

The distinction between absolute and effective irreversibility introduces a judgment call

work page
[32]

The cross-border consequences test requires the assessing jurisdiction to evaluate downstream effects on other countries’ systems and interests. 81

work page
[33]

The Act’s serious incident reporting is limited to actual harm events

Has a near miss or hazard indicated that harm is inadequately mitigated? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Not explicitly addressed in Article 73 reporting obligations. The Act’s serious incident reporting is limited to actual harm events. There is no mandatory near-miss or hazard reportin...

work page 2024
[34]

Near-miss identification depends on knowing that a serious incident was closely averted

work page
[35]

Near-miss reporting is highly sensitive to reporting culture

work page
[36]

Assessing whether an averted failure mode is plausibly shared across other developers’ systems requires cross-provider visibility

work page
[37]

83 D Summary of findings Table 25:Summary of findings

The boundary between a near miss and a low-severity incident is not always clear-cut. 83 D Summary of findings Table 25:Summary of findings. # Finding Dependency chain layer Primary audience Actionability Recommended action F1 Escalation frameworks depend on defini- tional and data infrastructure that does not yet exist. All layers Framework design- ers, ...

work page
[38]

AI-assisted state-sponsored cyber-espionage Anthropic, September 2025 State-sponsored actors (Chinese-linked group GTG-1002) used Claude Code within a custom orchestration framework to accelerate cyber-espionage against critical national infrastructure across multiple countries. The AI system executed reconnaissance, vulnerability discovery, payload gener...

work page 2025
[39]

At peak, approximately 6,700 images per hour were generated

Cumulative nonconsensual deepfake harms xAI / Grok on X, January 2026 Grok’s image generation feature was used by thousands of uncoordinated users to create and publicly distribute nonconsensual sexualized images of real people, including minors, via reply prompts on X. At peak, approximately 6,700 images per hour were generated. Generation capability and...

work page 2026
[40]

The exposure enabled read/write access and potential impersonation of AI agent accounts, some of which held financial instrument permissions

Agentic platform database exposure OpenClaw / Moltbook, January 2026 Security researchers at Wiz accessed an exposed Moltbook database in under three minutes, obtaining approximately 35,000 email addresses, thousands of private messages, and around 1.5 million API authentication tokens. The exposure enabled read/write access and potential impersonation of...

work page 2026
[41]

Hospitalisations, deaths, and divorces have been reported

Psychological harm from human–AI interaction Global (multiple providers), 2025 onwards OpenAI disclosed internal estimates suggesting approximately 500,000 ChatGPT users per week exhibit risk factors including mania, psychosis, suicidal ideation, or emotional dependence. Hospitalisations, deaths, and divorces have been reported. A related Anthropic study ...

work page 2025
[42]

MJ Rathbun

Strategic misalignment by autonomous agent OpenClaw (agent persona “MJ Rathbun”), February 2026 An AI coding agent operating under the persona “MJ Rathbun” autonomously researched and publicly targeted a matplotlib maintainer who had rejected its pull request, composing and publishing a personalized blog post accusing the maintainer of bias and “gatekeepi...

work page 2026
[43]

AI involvement was not confirmed: the accused was a qualified doctor with baseline toxicology knowledge, and no public evidence established AI tool use

Alleged ricin terror plot (CBRN near-miss) India, November 2025 Indian authorities disrupted an alleged ricin terror plot following arrests in Gujarat, with seized precursor chemicals and evidence of reconnaissance of target sites. AI involvement was not confirmed: the accused was a qualified doctor with baseline toxicology knowledge, and no public eviden...

work page 2025
[44]

Lavender

Military AI targeting systems (Lavender/Gospel) Israel Defense Forces, April 2024 The AI systems “Lavender” and “The Gospel” were reportedly used by the IDF to identify individuals and select strike targets in Gaza, with allegations of limited human review (20 seconds per target across 37,000 targets) and mass civilian casualties. The IDF disputes the cha...

work page 2024
[45]

AI was a potential detection channel rather than in the causal chain: the system flagged concerning content but the developer’s internal escalation threshold was not met

Failed escalation of credible risk (BC school shooting) Canada, February 2026 OpenAI allegedly did not alert the RCMP after ChatGPT’s internal systems flagged violent conversations with a user who subsequently carried out a school shooting in British Columbia. AI was a potential detection channel rather than in the causal chain: the system flagged concern...

work page 2026
[46]

Russian military intelligence agency (GRU) funded influence campaign Multi-incident cluster, 2024–2026 A cluster of linked incidents in which Russia’s military intelligence agency (the GRU) funded and coordinated an influence campaign across multiple countries. Operatives used generative AI to mass-produce fake news articles, fabricated videos, and synthe...

work page 2024
[47]

A coordinated Russian military intelligence (GRU) operation to poison AI training data GRU-linked network, December 2024 onwards Represents a component of incident 9 but is assessed separately because its risk pathway raises distinct framework challenges. A cluster of linked incidents in which the GRU-linked Pravda network, a system of 280+ websites publi...

work page 2024
[48]

It detects one agent’s behavior systematically predicting changes in another’s (i.e

captures directed, time-asymmetric information flow between agent outputs. It detects one agent’s behavior systematically predicting changes in another’s (i.e. causal influence) and can distinguish T1 (no inter-agent causal flow) from T3 (elevated inter-agent causal flow). Table 28 summarises the measurement signatures for each regime. Table 28:Informatio...

work page 2009

[1] [1]

International Atomic Energy Agency

URL https://internationalaisafetyreport.org/publication/2026-report-extended-summa ry-policymakers. International Atomic Energy Agency. Convention on Early Notification of a Nuclear Accident, September 1986. URL https://www.iaea.org/topics/nuclear-safety-conventions/convention-early-notification-n uclear-accident. Place: Vienna Publisher: International At...

work page doi:10.1111/jofi.12498 2026

[2] [2]

suicide coach

URLhttps://thefuturesociety.org/aicrisisexplainer/. OECD. Stocktaking for the development of an AI incident definition. OECD Artificial Intelligence Papers 4, OECD Publishing, Paris, 2023. OECD. Defining AI Incidents and Related Terms. OECD Artificial Intelligence Papers 16, Organisation for Economic Co-operation and Development, 2024a. OECD. Defining AI ...

work page doi:10.1103/physrevlett.86.3200 2023

[3] [3]

Henri Theil

Press release. Henri Theil. The Development of International Inequality 1960–1985.Journal of Econometrics, 42(1):145–155, 1989. UK Health and Safety Executive. Introduction to the seveso iii directive, 2025. URL https://www.hse.gov.uk/s eveso/introduction.htm. Updated 5 February 2025; accessed 2026-04-23. United Nations Office for Disaster Risk Reduction....

work page doi:10.1073/pnas.082090499 1960

[4] [4]

Experiment,

arXiv:2406.07358 [cs]. Marty J. Wolf, Keith W. Miller, and Frances S. Grodzinsky. Why We Should Have Seen That Coming: Comments on Microsoft’s Tay “Experiment,” and Wider Implications.ACM SIGCAS Computers and Society, 47(3):54–64, 2017. doi: 10.1145/3144592.3144598. World Health Organization. Annex 2 of the International Health Regulations (2005), 2005. U...

work page doi:10.1145/3144592.3144598 2017

[5] [5]

‘serious incidents resulting from the use of their AI systems, meaning incident or malfunctioning leading to

Was AI a causal factor? How is the causal role of AI in the incident established? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Provider must report where the AI system ‘directly or indirectly leads to’ any listed serious harm. ‘serious incidents resulting from the use of their AI systems, meaning inc...

work page 2024

[6] [6]

Trade-off with escalation decision based on confidence level: escalating with low confidence may risk unnecessary disruption/ alarm if no actual AI causality, and may reduce confidence in the escalation process, but escalation delays could risk avoidable harm propagation

work page

[7] [7]

For multi-agent environments, it may be hard to establish which AI systems are causally responsible for an incident, particularly if there is a lack of information about deployed AI systems/ agents and a lack of data on their shared interactions

work page

[8] [8]

Is the incident in an excluded domain or assessment context? Some domains, such as government use or military, may be deemed out of scope. EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal EU AI Act: excludes AI systems placed on market or put into service for military, defense, or national security purpo...

work page 2022

[9] [9]

Excluded domains such as military or national security may preclude preventative action or direct containment within this framework. While an incident originating in an excluded domain could in principle warrant international coordination for warning, irreversible harm, or information gaps, these functions would in practice be managed through existing nat...

work page

[10] [10]

Dual-use AI systems may fall in and out of scope depending on deployment context (as in EU AI Act), creating classification ambiguity at the point of incident assessment. 67

work page

[11] [11]

(b) Serious and irreversible disruption of the management or operation of critical infrastructure

Has an immediate escalation condition been met? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Only part (c) under the ‘Serious Incident’ definition relates to a condition for escalation which does not relate to an explicit measurable harm: Article 3(49) — ‘Serious Incident’ definition (a) Death of a p...

work page

[12] [12]

As AI capabilities and deployment contexts evolve, the set of conditions warranting immediate escalation will need to be reviewed and updated

work page

[13] [13]

The EU AI Act’s commitment to annual review of Article 5 prohibited practices (Article 112) provides a precedent for building this into the framework’s governance. 70

work page

[14] [14]

AI incident variants

Is the incident part of a broader pattern? (correlated / related incidents) EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(65) — Systemic Risk definition Systemic risk must be capable of being ‘propagated at scale across the value chain’ implying that a pattern of related incidents across the...

work page 2024

[15] [15]

Identifying patterns relating to a serious incident being assessed requires tracking and analytics of incidents to be done and for clustering patterns to have been defined

work page

[16] [16]

The level at which monitoring is done has implications for the incident patterns which can be identified within and across AI model developers

work page

[17] [17]

Pattern identification at the capability and contextual root cause levels requires cross-provider incident visibility that is unlikely to be achievable by individual providers acting alone

work page

[18] [18]

The use of different taxonomies and classification approaches across providers and incident databases increases the difficulty of building a cross-provider picture of systemic risk and incident patterns

work page

[19] [19]

Clustering necessitates tracking and analytics of incidents: the level at which monitoring is performed has direct implications for which incident patterns can be identified. 73 5a. Has harm occurred in a relevant category? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(49) — ‘Serious Inciden...

work page 2024

[20] [20]

The GPAI Code of Practice Appendix 1.1 risk types and the Appendix 1.4 specified systemic risks operate at different levels of abstraction to the MIT harm taxonomy categories used at this criterion

work page

[21] [21]

Some categories may require investigation or contextual evidence that is unavailable at the point of initial assessment

The MIT harm taxonomy was designed for retrospective incident classification, not real-time triage. Some categories may require investigation or contextual evidence that is unavailable at the point of initial assessment. 75 5b. Has harm crossed a relevant severity threshold? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other d...

work page 2024

[22] [22]

Many harm severities in the EU AI Act are defined using legal terminology, which is harder to operationalize as testable thresholds or trigger conditions

work page

[23] [23]

The graduated severity scale is intended to address this gap, but the boundary between Level 3 and Level 4 may require further specification

A gap exists between SB 53’s quantitative thresholds and the lower boundary of harm that may nonetheless warrant escalation. The graduated severity scale is intended to address this gap, but the boundary between Level 3 and Level 4 may require further specification

work page

[24] [24]

It is not operationalized as a measurable assessment criterion for individual incidents for escalation

Is international coordination required to contain the incident or respond to its cross-border propagation? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Propagation is treated as a defining characteristic of systemic risk: propagation capacity is what makes a risk systemic. It is not operationalized a...

work page 2024

[25] [25]

Existing AI governance frameworks do not account for incident-level propagation assessment

work page

[26] [26]

Effective incident-level propagation assessment depends on the ability to observe and correlate incidents across providers, deployment contexts, and jurisdictions in a timely manner

work page

[27] [27]

For capability and emergent propagation, quantitative estimation of exposure scope is substantially harder

For supply-chain propagation, exposure scope can often be estimated by tracing the technical dependency tree. For capability and emergent propagation, quantitative estimation of exposure scope is substantially harder. 79

work page

[28] [28]

Article 3(49)(b) — Serious Incident definition Irreversibility is explicitly named as a qualifying criterion for critical infrastructure disruption

Does irreversible harm require international coordination to assess or respond to its cross-border consequences? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Irreversibility is explicitly named as a qualifying criterion for one harm category and is a defining characteristic of loss of control risk un...

work page 2024

[29] [29]

Where full reversal is technically possible but at infeasible cost or timescale, this is treated as effectively irreversible

Any harm impacts which cannot be reversed are defined as D. Where full reversal is technically possible but at infeasible cost or timescale, this is treated as effectively irreversible. If D = 0→No escalation required. If D > 0→proceed to Step 2. Step 2. Does the irreversible harm have consequences extending beyond the affected jurisdiction? If yes→Escala...

work page

[30] [30]

Estimating D at the point of triage is inherently uncertain

work page

[31] [31]

The distinction between absolute and effective irreversibility introduces a judgment call

work page

[32] [32]

The cross-border consequences test requires the assessing jurisdiction to evaluate downstream effects on other countries’ systems and interests. 81

work page

[33] [33]

The Act’s serious incident reporting is limited to actual harm events

Has a near miss or hazard indicated that harm is inadequately mitigated? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Not explicitly addressed in Article 73 reporting obligations. The Act’s serious incident reporting is limited to actual harm events. There is no mandatory near-miss or hazard reportin...

work page 2024

[34] [34]

Near-miss identification depends on knowing that a serious incident was closely averted

work page

[35] [35]

Near-miss reporting is highly sensitive to reporting culture

work page

[36] [36]

Assessing whether an averted failure mode is plausibly shared across other developers’ systems requires cross-provider visibility

work page

[37] [37]

83 D Summary of findings Table 25:Summary of findings

The boundary between a near miss and a low-severity incident is not always clear-cut. 83 D Summary of findings Table 25:Summary of findings. # Finding Dependency chain layer Primary audience Actionability Recommended action F1 Escalation frameworks depend on defini- tional and data infrastructure that does not yet exist. All layers Framework design- ers, ...

work page

[38] [38]

AI-assisted state-sponsored cyber-espionage Anthropic, September 2025 State-sponsored actors (Chinese-linked group GTG-1002) used Claude Code within a custom orchestration framework to accelerate cyber-espionage against critical national infrastructure across multiple countries. The AI system executed reconnaissance, vulnerability discovery, payload gener...

work page 2025

[39] [39]

At peak, approximately 6,700 images per hour were generated

Cumulative nonconsensual deepfake harms xAI / Grok on X, January 2026 Grok’s image generation feature was used by thousands of uncoordinated users to create and publicly distribute nonconsensual sexualized images of real people, including minors, via reply prompts on X. At peak, approximately 6,700 images per hour were generated. Generation capability and...

work page 2026

[40] [40]

The exposure enabled read/write access and potential impersonation of AI agent accounts, some of which held financial instrument permissions

Agentic platform database exposure OpenClaw / Moltbook, January 2026 Security researchers at Wiz accessed an exposed Moltbook database in under three minutes, obtaining approximately 35,000 email addresses, thousands of private messages, and around 1.5 million API authentication tokens. The exposure enabled read/write access and potential impersonation of...

work page 2026

[41] [41]

Hospitalisations, deaths, and divorces have been reported

Psychological harm from human–AI interaction Global (multiple providers), 2025 onwards OpenAI disclosed internal estimates suggesting approximately 500,000 ChatGPT users per week exhibit risk factors including mania, psychosis, suicidal ideation, or emotional dependence. Hospitalisations, deaths, and divorces have been reported. A related Anthropic study ...

work page 2025

[42] [42]

MJ Rathbun

Strategic misalignment by autonomous agent OpenClaw (agent persona “MJ Rathbun”), February 2026 An AI coding agent operating under the persona “MJ Rathbun” autonomously researched and publicly targeted a matplotlib maintainer who had rejected its pull request, composing and publishing a personalized blog post accusing the maintainer of bias and “gatekeepi...

work page 2026

[43] [43]

AI involvement was not confirmed: the accused was a qualified doctor with baseline toxicology knowledge, and no public evidence established AI tool use

Alleged ricin terror plot (CBRN near-miss) India, November 2025 Indian authorities disrupted an alleged ricin terror plot following arrests in Gujarat, with seized precursor chemicals and evidence of reconnaissance of target sites. AI involvement was not confirmed: the accused was a qualified doctor with baseline toxicology knowledge, and no public eviden...

work page 2025

[44] [44]

Lavender

Military AI targeting systems (Lavender/Gospel) Israel Defense Forces, April 2024 The AI systems “Lavender” and “The Gospel” were reportedly used by the IDF to identify individuals and select strike targets in Gaza, with allegations of limited human review (20 seconds per target across 37,000 targets) and mass civilian casualties. The IDF disputes the cha...

work page 2024

[45] [45]

AI was a potential detection channel rather than in the causal chain: the system flagged concerning content but the developer’s internal escalation threshold was not met

Failed escalation of credible risk (BC school shooting) Canada, February 2026 OpenAI allegedly did not alert the RCMP after ChatGPT’s internal systems flagged violent conversations with a user who subsequently carried out a school shooting in British Columbia. AI was a potential detection channel rather than in the causal chain: the system flagged concern...

work page 2026

[46] [46]

Russian military intelligence agency (GRU) funded influence campaign Multi-incident cluster, 2024–2026 A cluster of linked incidents in which Russia’s military intelligence agency (the GRU) funded and coordinated an influence campaign across multiple countries. Operatives used generative AI to mass-produce fake news articles, fabricated videos, and synthe...

work page 2024

[47] [47]

A coordinated Russian military intelligence (GRU) operation to poison AI training data GRU-linked network, December 2024 onwards Represents a component of incident 9 but is assessed separately because its risk pathway raises distinct framework challenges. A cluster of linked incidents in which the GRU-linked Pravda network, a system of 280+ websites publi...

work page 2024

[48] [48]

It detects one agent’s behavior systematically predicting changes in another’s (i.e

captures directed, time-asymmetric information flow between agent outputs. It detects one agent’s behavior systematically predicting changes in another’s (i.e. causal influence) and can distinguish T1 (no inter-agent causal flow) from T3 (elevated inter-agent causal flow). Table 28 summarises the measurement signatures for each regime. Table 28:Informatio...

work page 2009