Exploring Systems-Thinking Approaches to Loss of Control Risk

Aurelio Carlucci; Jakub Kry\'s; James Walpole; Sean P. Fillingham

arxiv: 2606.13474 · v2 · pith:4CRZCGYRnew · submitted 2026-06-11 · 💻 cs.CY

Exploring Systems-Thinking Approaches to Loss of Control Risk

Aurelio Carlucci , Sean P. Fillingham , James Walpole , Jakub Kry\'s This is my paper

Pith reviewed 2026-06-30 11:21 UTC · model grok-4.3

classification 💻 cs.CY

keywords loss of controlsystems safetyagentic AIinternal deploymentAI governanceSTPAFRAMsociotechnical control

0 comments

The pith

Systems-safety methods show AI governance frameworks often leave responsibilities unverifiable and safeguards prone to erosion by delays and variability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reconstructs a generic frontier-lab scenario for internal deployment of agentic AI coding systems and applies three established systems-safety methods to it. The analyses identify that published control frameworks can leave governance duties and feedback loops externally unverifiable, that monitoring delays can nullify otherwise sound interventions, and that normal operational changes can gradually weaken safeguard calibration and independence. A reader would care because this implies that model-level checks alone may fail to catch sociotechnical risks in how AI changes to code and infrastructure are actually constrained in practice. The authors conclude that risk management should combine model evaluations with systems-level hazard analysis and ongoing assurance that controls stay effective over time.

Core claim

Treating loss of control as the inability to reliably constrain, audit, reverse or halt AI-mediated changes in time, the application of STECA, STPA and FRAM to the scenario surfaces three complementary gaps: published frameworks leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make valid control actions ineffective; and routine operational variability can erode the calibration and independence of safeguards. Frontier-AI risk management should therefore pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.

What carries the argument

Application of STECA, STPA and FRAM to a reconstructed internal coding-agent deployment scenario to surface sociotechnical control gaps beyond model behaviour.

If this is right

Governance responsibilities and feedback loops must be made externally verifiable to prevent loss of control.
Control actions must incorporate explicit accounting for monitoring and intervention delays to stay effective.
Safeguards require periodic re-calibration to counter gradual erosion from routine operational variability.
Risk management must track ongoing effectiveness of controls rather than assuming initial design sufficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar systems analyses could be applied to other internal agentic AI uses such as research or data pipelines.
Organisations may need new operational metrics to detect erosion of safeguards before harm occurs.
The findings suggest value in testing these methods on anonymised logs from actual deployments rather than generic reconstructions.

Load-bearing premise

The generic frontier-lab coding-agent scenario reconstructed from public materials is representative of real deployments and the chosen methods can validly surface risks missed by model-level evaluations.

What would settle it

A documented real-world internal AI coding deployment in which all governance responsibilities prove externally verifiable, control actions remain effective despite monitoring delays, and safeguards show no measurable erosion from operational variability.

Figures

Figures reproduced from arXiv: 2606.13474 by Aurelio Carlucci, Jakub Kry\'s, James Walpole, Sean P. Fillingham.

**Figure 1.** Figure 1: Functional control structure for internal deployment of a frontier AI coding agent. Arrows pointing downwards denote Control Actions, while those pointing upwards denote Feedback to the controllers. 5.3. Unsafe Control Actions and Loss Scenarios Unsafe control actions and loss scenarios can be systematically determined using STPA (see Appendix E.5 and Appendix E.6, respectively, for our complete list). H… view at source ↗

**Figure 2.** Figure 2: FRAM coupling diagram for internal deployment of a frontier AI coding agent. Lines between functions represent dependencies: the output of one function serves as an input, control, or resource for another. Background function B2 (policies and permissions) controls F1–F4 simultaneously; B1 (infrastructure) provides resources to F2 and F4. Second, B2’s outputs shape every foreground function at once. B2 prov… view at source ↗

read the original abstract

Internal deployment of agentic AI systems for coding and research creates a sociotechnical control problem that extends beyond model behaviour. We treat internal-deployment Loss of Control as the inability to reliably constrain, audit, reverse, or halt AI-mediated changes to code, infrastructure, evaluation, or deployment processes in time to prevent serious organisational or societal harms. We ask whether established systems-safety methods can identify risks that model-level evaluations may miss. Using a generic frontier-lab coding-agent scenario reconstructed from public materials, we apply STECA, STPA, and FRAM. The analyses surface complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make otherwise valid control actions ineffective; and routine operational variability can gradually erode the calibration and independence of safeguards. We argue that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies STECA, STPA, and FRAM to an AI coding-agent deployment scenario and surfaces governance, delay, and erosion issues, but the public-material reconstruction limits how far the findings travel.

read the letter

The paper's key move is to apply STECA, STPA, and FRAM to a reconstructed scenario of internal deployment of coding agents and to report three findings on governance verifiability, intervention delays, and safeguard erosion.

This is new in the sense that it brings these particular systems safety tools to the specific problem of loss of control in agentic AI inside labs. The paper does well at making the case that model-level work leaves sociotechnical control issues unexamined and at sketching how the three methods give complementary views.

The soft spot is the scenario construction. Since it comes from public materials, it likely underrepresents the monitoring, access, and override mechanisms that real deployments have. The stress-test concern holds: if those mechanisms close the loops the paper flags, the results do not demonstrate risks missed by model evaluations. The abstract states the analyses were performed, but the lack of detail on steps or checks makes it hard to judge the strength of the applications.

This paper is for AI safety people who are open to systems engineering ideas for risk management. It shows honest engagement with the methods rather than forcing a new framework.

I would bring it to the next reading group as maybe. I would not cite it in the next year. It deserves peer review because the direction is worth testing even if the current scenario needs work.

Referee Report

2 major / 2 minor

Summary. The paper claims that internal deployment of agentic AI coding systems creates a sociotechnical Loss of Control problem extending beyond model behavior, defined as inability to reliably constrain, audit, reverse or halt AI-mediated changes. It reconstructs a generic frontier-lab coding-agent scenario from public materials and applies STECA, STPA and FRAM, surfacing three complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can render otherwise valid controls ineffective; and routine operational variability can erode safeguard calibration and independence. The authors conclude that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and ongoing operational assurance.

Significance. If the central claim holds under scrutiny of the scenario, the work demonstrates a concrete, non-circular application of established systems-safety methods to an AI deployment context and explicitly credits the identification of governance, delay and erosion risks as complementary to model-level work. This could strengthen arguments for operational assurance practices that track control effectiveness over time rather than relying solely on static evaluations.

major comments (2)

[Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.
[Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.

minor comments (2)

Define STECA, STPA and FRAM on first use and provide a brief one-sentence reminder of each method's core purpose when first applied to the scenario.
Clarify whether the three methods were applied independently or in sequence, and whether any cross-validation between their outputs was performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the manuscript. We address each major comment below, indicating planned revisions where appropriate. The responses focus on clarifying the scope of the generic scenario and enhancing methodological transparency without altering the core claims.

read point-by-point responses

Referee: [Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.

Authors: The scenario is explicitly presented as a generic reconstruction drawn from publicly available materials on frontier-lab practices, not as a claim about any specific organization's internal controls. Its purpose is to demonstrate how systems methods can surface potential governance gaps based on what external observers can verify. We agree that the absence of sensitivity analysis over variants (e.g., additional internal monitoring or override paths) is a limitation that could affect interpretation of the findings. In revision we will add a dedicated limitations subsection that enumerates plausible variants, discusses how each might alter the identified unverifiable loops or delay risks, and reiterates that the analysis concerns external verifiability rather than definitive internal risk. This will make the scope and evidential basis clearer. revision: yes
Referee: [Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.

Authors: We concur that explicit, reproducible methodological steps are required. The current manuscript describes the application at a high level but does not provide the detailed elicitation process or variability instantiation. In the revised version we will expand the Methods section with step-by-step accounts: for STECA and STPA, the sources and criteria used to construct each control structure element; for FRAM, the specific variability scenarios instantiated for the coding-agent functions and the rationale for each. These additions will enable independent readers to trace, reproduce, or challenge the governance and delay findings. revision: yes

Circularity Check

0 steps flagged

No circularity: external methods applied to reconstructed scenario

full rationale

The paper reconstructs a generic frontier-lab coding-agent scenario from public materials and applies three pre-existing, independently developed systems-safety methods (STECA, STPA, FRAM) to it. The surfaced findings on governance, delays, and erosion are direct outputs of those established analytic frameworks rather than quantities fitted to the scenario or derived from self-referential definitions. No equations, fitted parameters, or self-citations appear as load-bearing steps in the provided derivation chain; the central claim therefore remains an application of external tools and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described. The central claim rests on the unstated assumption that the reconstructed scenario and chosen methods are appropriate, but these are not formalized in the provided text.

pith-pipeline@v0.9.1-grok · 5694 in / 1144 out tokens · 28008 ms · 2026-06-30T11:21:02.886787+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 8 canonical work pages

[1]

and Hedley, P

Adler, S. and Hedley, P. Guidelight AI Standards , May 2026. URL https://www.guidelight.ai/control

2026
[2]

Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025

Anthropic . Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025. URL https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report.pdf

2025
[3]

Automated weak-to-strong researcher, 2026 a

Anthropic . Automated weak-to-strong researcher, 2026 a . URL https://alignment.anthropic.com/2026/automated-w2s-researcher/

2026
[4]

Alignment risk update: Claude mythos preview, April 2026 b

Anthropic . Alignment risk update: Claude mythos preview, April 2026 b . URL https://anthropic.com/claude-mythos-preview-risk-report

2026
[5]

Sabotage risk report: Claude opus 4.6, February 2026 c

Anthropic . Sabotage risk report: Claude opus 4.6, February 2026 c . URL https://anthropic.com/feb-2026-risk-report

2026
[6]

Responsible Scaling Policy , version 3.0, February 2026 d

Anthropic . Responsible Scaling Policy , version 3.0, February 2026 d . URL https://anthropic.com/responsible-scaling-policy/rsp-v3-0

2026
[7]

System card: Claude mythos preview, April 2026 e

Anthropic . System card: Claude mythos preview, April 2026 e . URL https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf

2026
[8]

arXiv preprint arXiv:2512.08864(2025)

Barrett, S., Murray, M., Quarks, O., Smith, M., Kryś, J., Campos, S., Boria, A. T., Touzet, C., Hayrapet, S., Heiding, F., and others . Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse . arXiv preprint arXiv:2512.08864, 2025

work page arXiv 2025
[9]

P., Rhodes, C., and Vergani, S

Barrett, S., Bruvere, A., Fillingham, S. P., Rhodes, C., and Vergani, S. STAMP / STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems , February 2026. URL http://arxiv.org/abs/2512.17600. arXiv:2512.17600 [cs]

work page arXiv 2026
[10]

International AI safety report 2026

Bengio, Y., Murray, M., Prunkl, C., and others . International AI safety report 2026. Independent Scientific Report DSIT 2026/001, Department for Science, Innovation and Technology, February 2026. URL https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026. arXiv: 2602.21012

work page arXiv 2026
[11]

Boudreaux, B., Vermeer, M. J. D., Horton, K., and Kalra, N. The Case for AI Loss of Control Response Planning and an Outline to Get Started . Expert Insights , RAND Corporation, October 2025. URL https://www.rand.org/pubs/perspectives/PEA4232-1.html

2025
[12]

California Senate Bill no

California Legislature . California Senate Bill no. 53, Transparency in Frontier Artificial Intelligence Act , 2025. URL https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53

2025
[13]

A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025

Campos, S., Papadatos, H., Roger, F., Touzet, C., Quarks, O., and Murray, M. A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025. URL https://arxiv.org/abs/2502.06656

work page arXiv 2025
[14]

Risk Reporting for Developers ' Internal AI Model Use , April 2025

Delaney, O., Maheshwari, S., O'Brien, J., Bearman, T., and Guest, O. Risk Reporting for Developers ' Internal AI Model Use , April 2025. URL https://www.iaps.ai/research/risk-reporting-for-developers-internal-ai-model-use

2025
[15]

Code of Practice for General - Purpose AI Models : Transparency Chapter

European Commission . Code of Practice for General - Purpose AI Models : Transparency Chapter . Technical report, AI Office, European Commission, 2024. URL https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai

2024
[16]

Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024

European Union . Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024. URL https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

2024
[17]

Fleming, C. H. Safety-driven early concept analysis and development. Thesis, Massachusetts Institute of Technology, 2015. URL https://dspace.mit.edu/handle/1721.1/97352

2015
[18]

Frontier safety framework, version 3.1, September 2025

Google DeepMind . Frontier safety framework, version 3.1, September 2025. URL https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf

2025
[19]

FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems

Hollnagel, E. FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems . Crc Press, Boca Raton, 2012. ISBN 978-1-4094-4551-7

2012
[20]

and Schuett, J

Koessler, L. and Schuett, J. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries, July 2023. URL http://arxiv.org/abs/2307.08823

work page arXiv 2023
[21]

and Thomas, J

Leveson, N. and Thomas, J. STPA handbook. Technical report, MIT Partnership for Systems Approaches to Safety and Security (PSASS), Cambridge, Massachusetts, U.S., 2018

2018
[22]

Leveson, N. G. Engineering a Safer World : Systems Thinking Applied to Safety . MIT Press, 2012. URL https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied

2012
[23]

Common elements of frontier AI safety policies, December 2025

METR . Common elements of frontier AI safety policies, December 2025. URL https://metr.org/common-elements.pdf

2025
[24]

MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024

MITRE Corporation . MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024. URL https://attack.mitre.org/

2024
[25]

Systematic Hazard Analysis for Frontier AI using STPA , June 2025

Mylius, S. Systematic Hazard Analysis for Frontier AI using STPA , June 2025. URL http://arxiv.org/abs/2506.01782. Myl\_Systematic\_2025 arXiv:2506.01782 [cs] version: 1

work page arXiv 2025
[26]

Preparedness Framework , version 2.0, April 2025

OpenAI . Preparedness Framework , version 2.0, April 2025. URL https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf

2025
[27]

How we monitor internal coding agents for misalignment, March 2026

OpenAI . How we monitor internal coding agents for misalignment, March 2026. URL https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

2026
[28]

From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024

Rismani, S., Dobbe, R., and Moon, A. From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024. URL http://arxiv.org/abs/2410.22526. arXiv:2410.22526 [cs]

work page arXiv 2024
[29]

System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025

SAE International . System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025. URL https://www.sae.org/standards/j3307_202503-system-theoretic-process-analysis-stpa-standard-industries

2025
[30]

Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents

Somani, E., Friedman, A., Wu, H., Lu, M., Byrd, C., van Soest, H., and Zakaria, S. Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents . Research Report , RAND Corporation, July 2025. URL https://www.rand.org/pubs/research_reports/RRA3847-1.html

2025
[31]

The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025

Stix, C., Hallensleben, A., Ortega, A., and Pistillo, M. The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025. URL http://arxiv.org/abs/2511.15846. stixLossControlPlaybook2025 arXiv:2511.15846 [cs] version: 5

work page arXiv 2025
[32]

STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024

Thomas, J. STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024. URL https://psas.scripts.mit.edu/home/wp-content/uploads/2024/STPA-Scenarios-New-Approach.pdf

2024
[33]

Tkeshelashvili, M., Verma, R., and Kelly, S. M. AI Loss of Control Risk : Indications & Warning , 2026. URL https://securityandtechnology.org/virtual-library/report/ai-loss-of-control-risk-indications-warning/

2026

[1] [1]

and Hedley, P

Adler, S. and Hedley, P. Guidelight AI Standards , May 2026. URL https://www.guidelight.ai/control

2026

[2] [2]

Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025

Anthropic . Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025. URL https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report.pdf

2025

[3] [3]

Automated weak-to-strong researcher, 2026 a

Anthropic . Automated weak-to-strong researcher, 2026 a . URL https://alignment.anthropic.com/2026/automated-w2s-researcher/

2026

[4] [4]

Alignment risk update: Claude mythos preview, April 2026 b

Anthropic . Alignment risk update: Claude mythos preview, April 2026 b . URL https://anthropic.com/claude-mythos-preview-risk-report

2026

[5] [5]

Sabotage risk report: Claude opus 4.6, February 2026 c

Anthropic . Sabotage risk report: Claude opus 4.6, February 2026 c . URL https://anthropic.com/feb-2026-risk-report

2026

[6] [6]

Responsible Scaling Policy , version 3.0, February 2026 d

Anthropic . Responsible Scaling Policy , version 3.0, February 2026 d . URL https://anthropic.com/responsible-scaling-policy/rsp-v3-0

2026

[7] [7]

System card: Claude mythos preview, April 2026 e

Anthropic . System card: Claude mythos preview, April 2026 e . URL https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf

2026

[8] [8]

arXiv preprint arXiv:2512.08864(2025)

Barrett, S., Murray, M., Quarks, O., Smith, M., Kryś, J., Campos, S., Boria, A. T., Touzet, C., Hayrapet, S., Heiding, F., and others . Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse . arXiv preprint arXiv:2512.08864, 2025

work page arXiv 2025

[9] [9]

P., Rhodes, C., and Vergani, S

Barrett, S., Bruvere, A., Fillingham, S. P., Rhodes, C., and Vergani, S. STAMP / STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems , February 2026. URL http://arxiv.org/abs/2512.17600. arXiv:2512.17600 [cs]

work page arXiv 2026

[10] [10]

International AI safety report 2026

Bengio, Y., Murray, M., Prunkl, C., and others . International AI safety report 2026. Independent Scientific Report DSIT 2026/001, Department for Science, Innovation and Technology, February 2026. URL https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026. arXiv: 2602.21012

work page arXiv 2026

[11] [11]

Boudreaux, B., Vermeer, M. J. D., Horton, K., and Kalra, N. The Case for AI Loss of Control Response Planning and an Outline to Get Started . Expert Insights , RAND Corporation, October 2025. URL https://www.rand.org/pubs/perspectives/PEA4232-1.html

2025

[12] [12]

California Senate Bill no

California Legislature . California Senate Bill no. 53, Transparency in Frontier Artificial Intelligence Act , 2025. URL https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53

2025

[13] [13]

A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025

Campos, S., Papadatos, H., Roger, F., Touzet, C., Quarks, O., and Murray, M. A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025. URL https://arxiv.org/abs/2502.06656

work page arXiv 2025

[14] [14]

Risk Reporting for Developers ' Internal AI Model Use , April 2025

Delaney, O., Maheshwari, S., O'Brien, J., Bearman, T., and Guest, O. Risk Reporting for Developers ' Internal AI Model Use , April 2025. URL https://www.iaps.ai/research/risk-reporting-for-developers-internal-ai-model-use

2025

[15] [15]

Code of Practice for General - Purpose AI Models : Transparency Chapter

European Commission . Code of Practice for General - Purpose AI Models : Transparency Chapter . Technical report, AI Office, European Commission, 2024. URL https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai

2024

[16] [16]

Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024

European Union . Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024. URL https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

2024

[17] [17]

Fleming, C. H. Safety-driven early concept analysis and development. Thesis, Massachusetts Institute of Technology, 2015. URL https://dspace.mit.edu/handle/1721.1/97352

2015

[18] [18]

Frontier safety framework, version 3.1, September 2025

Google DeepMind . Frontier safety framework, version 3.1, September 2025. URL https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf

2025

[19] [19]

FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems

Hollnagel, E. FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems . Crc Press, Boca Raton, 2012. ISBN 978-1-4094-4551-7

2012

[20] [20]

and Schuett, J

Koessler, L. and Schuett, J. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries, July 2023. URL http://arxiv.org/abs/2307.08823

work page arXiv 2023

[21] [21]

and Thomas, J

Leveson, N. and Thomas, J. STPA handbook. Technical report, MIT Partnership for Systems Approaches to Safety and Security (PSASS), Cambridge, Massachusetts, U.S., 2018

2018

[22] [22]

Leveson, N. G. Engineering a Safer World : Systems Thinking Applied to Safety . MIT Press, 2012. URL https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied

2012

[23] [23]

Common elements of frontier AI safety policies, December 2025

METR . Common elements of frontier AI safety policies, December 2025. URL https://metr.org/common-elements.pdf

2025

[24] [24]

MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024

MITRE Corporation . MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024. URL https://attack.mitre.org/

2024

[25] [25]

Systematic Hazard Analysis for Frontier AI using STPA , June 2025

Mylius, S. Systematic Hazard Analysis for Frontier AI using STPA , June 2025. URL http://arxiv.org/abs/2506.01782. Myl\_Systematic\_2025 arXiv:2506.01782 [cs] version: 1

work page arXiv 2025

[26] [26]

Preparedness Framework , version 2.0, April 2025

OpenAI . Preparedness Framework , version 2.0, April 2025. URL https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf

2025

[27] [27]

How we monitor internal coding agents for misalignment, March 2026

OpenAI . How we monitor internal coding agents for misalignment, March 2026. URL https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

2026

[28] [28]

From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024

Rismani, S., Dobbe, R., and Moon, A. From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024. URL http://arxiv.org/abs/2410.22526. arXiv:2410.22526 [cs]

work page arXiv 2024

[29] [29]

System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025

SAE International . System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025. URL https://www.sae.org/standards/j3307_202503-system-theoretic-process-analysis-stpa-standard-industries

2025

[30] [30]

Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents

Somani, E., Friedman, A., Wu, H., Lu, M., Byrd, C., van Soest, H., and Zakaria, S. Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents . Research Report , RAND Corporation, July 2025. URL https://www.rand.org/pubs/research_reports/RRA3847-1.html

2025

[31] [31]

The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025

Stix, C., Hallensleben, A., Ortega, A., and Pistillo, M. The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025. URL http://arxiv.org/abs/2511.15846. stixLossControlPlaybook2025 arXiv:2511.15846 [cs] version: 5

work page arXiv 2025

[32] [32]

STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024

Thomas, J. STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024. URL https://psas.scripts.mit.edu/home/wp-content/uploads/2024/STPA-Scenarios-New-Approach.pdf

2024

[33] [33]

Tkeshelashvili, M., Verma, R., and Kelly, S. M. AI Loss of Control Risk : Indications & Warning , 2026. URL https://securityandtechnology.org/virtual-library/report/ai-loss-of-control-risk-indications-warning/

2026