Designing escalation criteria for international AI incident response: criteria, triggers, and thresholds
Pith reviewed 2026-05-21 00:34 UTC · model grok-4.3
The pith
This paper proposes eight criteria to decide when an AI incident should escalate from national handling to international coordination.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper proposes an escalation framework of eight criteria for determining when a detected AI incident warrants international coordination. These criteria are organized into a flowchart with sequential decision gates and threshold checks, and each is mapped against existing regulatory texts to show where design choices aid or hinder detection. When applied to ten documented AI incidents and variants, the framework identifies three design patterns that lead to systematic under-detection in regimes assigning escalation responsibility to model developers: requiring confirmed harm before escalation, assessing incidents in isolation rather than as accumulating systems, and aligning thresholds 2
What carries the argument
The eight-criteria escalation framework structured as a sequential flowchart with gated decision points and threshold checks, which translates regulatory requirements into practical tests for international escalation.
If this is right
- Incidents such as model weight exfiltration are detected only after severe irreversible harm has already spread.
- Systemic harms that build from many small events risk being missed when each incident is judged alone.
- Thresholds written in legal language rather than measurable terms become hard to apply under time pressure.
- Escalation decisions depend on the underlying definitions of harm and the data available to the responsible actor, creating further sources of under-detection.
Where Pith is reading between the lines
- Regulators could adapt the same gated-check structure to other emerging risks such as advanced biotechnology.
- Collecting standardized incident data across borders would allow empirical refinement of the proposed thresholds.
- The interdependency between definitions, data access, and thresholds suggests that fixing escalation rules alone will not solve under-detection.
- International coordination bodies might treat the framework as an initial template for harmonized reporting standards.
Load-bearing premise
That the ten documented AI incidents and their structured variants are representative enough to reveal systematic under-detection patterns across regulatory regimes and that the criteria can be turned into workable thresholds without further empirical validation or new data.
What would settle it
Applying the eight criteria and flowchart to a fresh collection of AI incidents from multiple jurisdictions and observing whether the same three under-detection patterns appear or whether incidents are handled consistently at the right level without the framework.
Figures
read the original abstract
AI incident reporting requirements are emerging in regulation and policy, yet no operational criteria exist for determining when a detected AI incident warrants escalation beyond national handling to international coordination. This paper proposes an escalation framework to address this gap, intended as a common reference point across jurisdictions that enables aligned escalation while preserving flexibility in how actors respond within their own legal and policy contexts. We review SB 53, the EU AI Act, the GPAI Code of Practice, and incident frameworks from other industries to derive eight criteria for assessing whether an incident warrants escalation, translated into a sequential flowchart with gated decision points and threshold checks. For each criterion, we map how it interplays with these regulatory frameworks, identifying where their design choices support or undermine effective detection. We test the framework against ten documented AI incidents and structured variants to identify where criteria under-detect or misclassify incidents in practice. We find three design patterns that may lead to systematic under-detection in regimes where model developers are responsible for escalation: a. where escalation requires confirmed harm, events such as model weight exfiltration risk detection only after severe, irreversible harm has propagated; b. where incidents are assessed individually, systemic harms emerging from accumulation risk being under-detected; and c. where thresholds align with legal instruments rather than quantitatively testable terms, criteria risk being impractical to apply under time pressure. We also find that escalation rules are only one component of a broader framework: the underlying definitions against which thresholds are set, and the data available to the responsible actor, create interdependencies that can themselves drive under-detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an escalation framework with eight criteria for determining when a detected AI incident warrants escalation from national to international coordination. Derived from a review of regulations including SB 53, the EU AI Act, and the GPAI Code of Practice, the criteria are organized into a sequential flowchart with gated decision points and threshold checks. The framework is tested against ten documented AI incidents and structured variants to identify three design patterns that may produce systematic under-detection when model developers are responsible for escalation: confirmed-harm requirements, individual assessments, and legal-instrument thresholds. The paper also emphasizes interdependencies with underlying incident definitions and available data.
Significance. If the central claims hold, the work offers a constructive synthesis that could serve as a reference point for aligning international AI incident response while preserving jurisdictional flexibility. The mapping of criteria to existing regulatory frameworks and the identification of specific design patterns that risk under-detection provide actionable insights for policymakers. The attention to how definitions and data availability interact with escalation rules strengthens the practical relevance of the contribution.
major comments (2)
- [§5] §5 (Testing the framework against incidents): The manuscript does not specify selection criteria for the ten documented AI incidents or describe how the structured variants were constructed (e.g., which parameters were varied and on what basis). Because the claim that three design patterns produce systematic under-detection across regimes rests on this test set being representative, the absence of this methodology leaves the generalizability of the findings open to post-hoc selection concerns.
- [§3] §3 (Derivation of the eight criteria): Exact operational definitions, triggers, and threshold quantifications for the criteria are not fully detailed, which directly affects the evaluation of their interplay with frameworks such as the EU AI Act and the practicality of the gated flowchart under time pressure. This gap is load-bearing for the paper's assertion that the criteria can be translated into usable checks.
minor comments (2)
- [Abstract] The abstract states that variants were used to identify under-detection but provides no high-level indication of their construction; a single sentence clarifying their role would improve accessibility without altering length.
- [Figure 1] Figure 1 (flowchart) would benefit from explicit labeling of which criteria correspond to each gated decision point to aid readers in tracing the three identified patterns.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the methodological transparency and operational precision of our escalation framework. We address each major comment below, indicating planned revisions to strengthen the manuscript while preserving its core contributions on design patterns and interdependencies.
read point-by-point responses
-
Referee: [§5] §5 (Testing the framework against incidents): The manuscript does not specify selection criteria for the ten documented AI incidents or describe how the structured variants were constructed (e.g., which parameters were varied and on what basis). Because the claim that three design patterns produce systematic under-detection across regimes rests on this test set being representative, the absence of this methodology leaves the generalizability of the findings open to post-hoc selection concerns.
Authors: We agree that explicit documentation of incident selection and variant construction is necessary to support claims of systematic under-detection. The ten incidents were drawn from publicly reported cases spanning 2022–2024 to illustrate diversity in failure modes (e.g., weight exfiltration, cumulative bias effects, and regulatory non-compliance), while structured variants were generated by varying parameters such as harm confirmation status, assessment granularity (individual vs. aggregate), and threshold alignment with legal instruments. To eliminate concerns about representativeness and post-hoc selection, we will add a dedicated subsection in §5 that lists the incidents with sources, states the inclusion criteria (coverage of regulatory domains, incident scale, and data availability), and details the parameter variations used for each structured variant, accompanied by a summary table. revision: yes
-
Referee: [§3] §3 (Derivation of the eight criteria): Exact operational definitions, triggers, and threshold quantifications for the criteria are not fully detailed, which directly affects the evaluation of their interplay with frameworks such as the EU AI Act and the practicality of the gated flowchart under time pressure. This gap is load-bearing for the paper's assertion that the criteria can be translated into usable checks.
Authors: The manuscript derives the eight criteria from the reviewed instruments (SB 53, EU AI Act, GPAI Code) and maps their interplay, but we acknowledge that more granular operational definitions, concrete triggers, and example quantifications would improve evaluability and practical applicability. We will expand §3 with a table that provides, for each criterion, an operational definition, sample triggers drawn from the source regulations, and illustrative threshold quantifications (e.g., harm severity scales or temporal windows). This addition will also include a brief discussion of how the gated flowchart accommodates time pressure, directly addressing the referee’s concern about usability. revision: yes
Circularity Check
No significant circularity; derivation synthesizes external regulations and incidents
full rationale
The paper reviews independent external sources including SB 53, the EU AI Act, the GPAI Code of Practice, and incident frameworks from other industries to derive its eight criteria, then applies the resulting framework to ten documented AI incidents and structured variants. This constitutes a synthesis and testing process against outside materials rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations, ansatzes, or uniqueness theorems are invoked that reduce the central claims to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The reviewed regulations (SB 53, EU AI Act, GPAI Code of Practice) and incident frameworks from other industries supply sufficient material to derive eight operational escalation criteria.
- domain assumption The ten documented AI incidents and structured variants are representative enough to identify systematic under-detection patterns.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We test the framework against ten documented AI incidents and structured variants to identify three design patterns that may lead to systematic under-detection.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
International Atomic Energy Agency
URL https://internationalaisafetyreport.org/publication/2026-report-extended-summa ry-policymakers. International Atomic Energy Agency. Convention on Early Notification of a Nuclear Accident, September 1986. URL https://www.iaea.org/topics/nuclear-safety-conventions/convention-early-notification-n uclear-accident. Place: Vienna Publisher: International At...
-
[2]
URLhttps://thefuturesociety.org/aicrisisexplainer/. OECD. Stocktaking for the development of an AI incident definition. OECD Artificial Intelligence Papers 4, OECD Publishing, Paris, 2023. OECD. Defining AI Incidents and Related Terms. OECD Artificial Intelligence Papers 16, Organisation for Economic Co-operation and Development, 2024a. OECD. Defining AI ...
-
[3]
Press release. Henri Theil. The Development of International Inequality 1960–1985.Journal of Econometrics, 42(1):145–155, 1989. UK Health and Safety Executive. Introduction to the seveso iii directive, 2025. URL https://www.hse.gov.uk/s eveso/introduction.htm. Updated 5 February 2025; accessed 2026-04-23. United Nations Office for Disaster Risk Reduction....
-
[4]
arXiv:2406.07358 [cs]. Marty J. Wolf, Keith W. Miller, and Frances S. Grodzinsky. Why We Should Have Seen That Coming: Comments on Microsoft’s Tay “Experiment,” and Wider Implications.ACM SIGCAS Computers and Society, 47(3):54–64, 2017. doi: 10.1145/3144592.3144598. World Health Organization. Annex 2 of the International Health Regulations (2005), 2005. U...
-
[5]
Was AI a causal factor? How is the causal role of AI in the incident established? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Provider must report where the AI system ‘directly or indirectly leads to’ any listed serious harm. ‘serious incidents resulting from the use of their AI systems, meaning inc...
work page 2024
-
[6]
Trade-off with escalation decision based on confidence level: escalating with low confidence may risk unnecessary disruption/ alarm if no actual AI causality, and may reduce confidence in the escalation process, but escalation delays could risk avoidable harm propagation
-
[7]
For multi-agent environments, it may be hard to establish which AI systems are causally responsible for an incident, particularly if there is a lack of information about deployed AI systems/ agents and a lack of data on their shared interactions
-
[8]
Is the incident in an excluded domain or assessment context? Some domains, such as government use or military, may be deemed out of scope. EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal EU AI Act: excludes AI systems placed on market or put into service for military, defense, or national security purpo...
work page 2022
-
[9]
Excluded domains such as military or national security may preclude preventative action or direct containment within this framework. While an incident originating in an excluded domain could in principle warrant international coordination for warning, irreversible harm, or information gaps, these functions would in practice be managed through existing nat...
-
[10]
Dual-use AI systems may fall in and out of scope depending on deployment context (as in EU AI Act), creating classification ambiguity at the point of incident assessment. 67
-
[11]
(b) Serious and irreversible disruption of the management or operation of critical infrastructure
Has an immediate escalation condition been met? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Only part (c) under the ‘Serious Incident’ definition relates to a condition for escalation which does not relate to an explicit measurable harm: Article 3(49) — ‘Serious Incident’ definition (a) Death of a p...
-
[12]
As AI capabilities and deployment contexts evolve, the set of conditions warranting immediate escalation will need to be reviewed and updated
-
[13]
The EU AI Act’s commitment to annual review of Article 5 prohibited practices (Article 112) provides a precedent for building this into the framework’s governance. 70
-
[14]
Is the incident part of a broader pattern? (correlated / related incidents) EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(65) — Systemic Risk definition Systemic risk must be capable of being ‘propagated at scale across the value chain’ implying that a pattern of related incidents across the...
work page 2024
-
[15]
Identifying patterns relating to a serious incident being assessed requires tracking and analytics of incidents to be done and for clustering patterns to have been defined
-
[16]
The level at which monitoring is done has implications for the incident patterns which can be identified within and across AI model developers
-
[17]
Pattern identification at the capability and contextual root cause levels requires cross-provider incident visibility that is unlikely to be achievable by individual providers acting alone
-
[18]
The use of different taxonomies and classification approaches across providers and incident databases increases the difficulty of building a cross-provider picture of systemic risk and incident patterns
-
[19]
Clustering necessitates tracking and analytics of incidents: the level at which monitoring is performed has direct implications for which incident patterns can be identified. 73 5a. Has harm occurred in a relevant category? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Article 3(49) — ‘Serious Inciden...
work page 2024
-
[20]
The GPAI Code of Practice Appendix 1.1 risk types and the Appendix 1.4 specified systemic risks operate at different levels of abstraction to the MIT harm taxonomy categories used at this criterion
-
[21]
The MIT harm taxonomy was designed for retrospective incident classification, not real-time triage. Some categories may require investigation or contextual evidence that is unavailable at the point of initial assessment. 75 5b. Has harm crossed a relevant severity threshold? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other d...
work page 2024
-
[22]
Many harm severities in the EU AI Act are defined using legal terminology, which is harder to operationalize as testable thresholds or trigger conditions
-
[23]
A gap exists between SB 53’s quantitative thresholds and the lower boundary of harm that may nonetheless warrant escalation. The graduated severity scale is intended to address this gap, but the boundary between Level 3 and Level 4 may require further specification
-
[24]
Is international coordination required to contain the incident or respond to its cross-border propagation? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Propagation is treated as a defining characteristic of systemic risk: propagation capacity is what makes a risk systemic. It is not operationalized a...
work page 2024
-
[25]
Existing AI governance frameworks do not account for incident-level propagation assessment
-
[26]
Effective incident-level propagation assessment depends on the ability to observe and correlate incidents across providers, deployment contexts, and jurisdictions in a timely manner
-
[27]
For supply-chain propagation, exposure scope can often be estimated by tracing the technical dependency tree. For capability and emergent propagation, quantitative estimation of exposure scope is substantially harder. 79
-
[28]
Does irreversible harm require international coordination to assess or respond to its cross-border consequences? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Irreversibility is explicitly named as a qualifying criterion for one harm category and is a defining characteristic of loss of control risk un...
work page 2024
-
[29]
Any harm impacts which cannot be reversed are defined as D. Where full reversal is technically possible but at infeasible cost or timescale, this is treated as effectively irreversible. If D = 0→No escalation required. If D > 0→proceed to Step 2. Step 2. Does the irreversible harm have consequences extending beyond the affected jurisdiction? If yes→Escala...
-
[30]
Estimating D at the point of triage is inherently uncertain
-
[31]
The distinction between absolute and effective irreversibility introduces a judgment call
-
[32]
The cross-border consequences test requires the assessing jurisdiction to evaluate downstream effects on other countries’ systems and interests. 81
-
[33]
The Act’s serious incident reporting is limited to actual harm events
Has a near miss or hazard indicated that harm is inadequately mitigated? EU AI Act SB 53 AI incident databases (OECD AIM; AIID, MIT) Examples from other domains Our proposal Not explicitly addressed in Article 73 reporting obligations. The Act’s serious incident reporting is limited to actual harm events. There is no mandatory near-miss or hazard reportin...
work page 2024
-
[34]
Near-miss identification depends on knowing that a serious incident was closely averted
-
[35]
Near-miss reporting is highly sensitive to reporting culture
-
[36]
Assessing whether an averted failure mode is plausibly shared across other developers’ systems requires cross-provider visibility
-
[37]
83 D Summary of findings Table 25:Summary of findings
The boundary between a near miss and a low-severity incident is not always clear-cut. 83 D Summary of findings Table 25:Summary of findings. # Finding Dependency chain layer Primary audience Actionability Recommended action F1 Escalation frameworks depend on defini- tional and data infrastructure that does not yet exist. All layers Framework design- ers, ...
-
[38]
AI-assisted state-sponsored cyber-espionage Anthropic, September 2025 State-sponsored actors (Chinese-linked group GTG-1002) used Claude Code within a custom orchestration framework to accelerate cyber-espionage against critical national infrastructure across multiple countries. The AI system executed reconnaissance, vulnerability discovery, payload gener...
work page 2025
-
[39]
At peak, approximately 6,700 images per hour were generated
Cumulative nonconsensual deepfake harms xAI / Grok on X, January 2026 Grok’s image generation feature was used by thousands of uncoordinated users to create and publicly distribute nonconsensual sexualized images of real people, including minors, via reply prompts on X. At peak, approximately 6,700 images per hour were generated. Generation capability and...
work page 2026
-
[40]
Agentic platform database exposure OpenClaw / Moltbook, January 2026 Security researchers at Wiz accessed an exposed Moltbook database in under three minutes, obtaining approximately 35,000 email addresses, thousands of private messages, and around 1.5 million API authentication tokens. The exposure enabled read/write access and potential impersonation of...
work page 2026
-
[41]
Hospitalisations, deaths, and divorces have been reported
Psychological harm from human–AI interaction Global (multiple providers), 2025 onwards OpenAI disclosed internal estimates suggesting approximately 500,000 ChatGPT users per week exhibit risk factors including mania, psychosis, suicidal ideation, or emotional dependence. Hospitalisations, deaths, and divorces have been reported. A related Anthropic study ...
work page 2025
-
[42]
Strategic misalignment by autonomous agent OpenClaw (agent persona “MJ Rathbun”), February 2026 An AI coding agent operating under the persona “MJ Rathbun” autonomously researched and publicly targeted a matplotlib maintainer who had rejected its pull request, composing and publishing a personalized blog post accusing the maintainer of bias and “gatekeepi...
work page 2026
-
[43]
Alleged ricin terror plot (CBRN near-miss) India, November 2025 Indian authorities disrupted an alleged ricin terror plot following arrests in Gujarat, with seized precursor chemicals and evidence of reconnaissance of target sites. AI involvement was not confirmed: the accused was a qualified doctor with baseline toxicology knowledge, and no public eviden...
work page 2025
-
[44]
Military AI targeting systems (Lavender/Gospel) Israel Defense Forces, April 2024 The AI systems “Lavender” and “The Gospel” were reportedly used by the IDF to identify individuals and select strike targets in Gaza, with allegations of limited human review (20 seconds per target across 37,000 targets) and mass civilian casualties. The IDF disputes the cha...
work page 2024
-
[45]
Failed escalation of credible risk (BC school shooting) Canada, February 2026 OpenAI allegedly did not alert the RCMP after ChatGPT’s internal systems flagged violent conversations with a user who subsequently carried out a school shooting in British Columbia. AI was a potential detection channel rather than in the causal chain: the system flagged concern...
work page 2026
-
[46]
Russian military intelligence agency (GRU) funded influence campaign Multi-incident cluster, 2024–2026 A cluster of linked incidents in which Russia’s military intelligence agency (the GRU) funded and coordinated an influence campaign across multiple countries. Operatives used generative AI to mass-produce fake news articles, fabricated videos, and synthe...
work page 2024
-
[47]
A coordinated Russian military intelligence (GRU) operation to poison AI training data GRU-linked network, December 2024 onwards Represents a component of incident 9 but is assessed separately because its risk pathway raises distinct framework challenges. A cluster of linked incidents in which the GRU-linked Pravda network, a system of 280+ websites publi...
work page 2024
-
[48]
It detects one agent’s behavior systematically predicting changes in another’s (i.e
captures directed, time-asymmetric information flow between agent outputs. It detects one agent’s behavior systematically predicting changes in another’s (i.e. causal influence) and can distinguish T1 (no inter-agent causal flow) from T3 (elevated inter-agent causal flow). Table 28 summarises the measurement signatures for each regime. Table 28:Informatio...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.