Exploring Systems-Thinking Approaches to Loss of Control Risk
Pith reviewed 2026-06-30 11:21 UTC · model grok-4.3
The pith
Systems-safety methods show AI governance frameworks often leave responsibilities unverifiable and safeguards prone to erosion by delays and variability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating loss of control as the inability to reliably constrain, audit, reverse or halt AI-mediated changes in time, the application of STECA, STPA and FRAM to the scenario surfaces three complementary gaps: published frameworks leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make valid control actions ineffective; and routine operational variability can erode the calibration and independence of safeguards. Frontier-AI risk management should therefore pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.
What carries the argument
Application of STECA, STPA and FRAM to a reconstructed internal coding-agent deployment scenario to surface sociotechnical control gaps beyond model behaviour.
If this is right
- Governance responsibilities and feedback loops must be made externally verifiable to prevent loss of control.
- Control actions must incorporate explicit accounting for monitoring and intervention delays to stay effective.
- Safeguards require periodic re-calibration to counter gradual erosion from routine operational variability.
- Risk management must track ongoing effectiveness of controls rather than assuming initial design sufficiency.
Where Pith is reading between the lines
- Similar systems analyses could be applied to other internal agentic AI uses such as research or data pipelines.
- Organisations may need new operational metrics to detect erosion of safeguards before harm occurs.
- The findings suggest value in testing these methods on anonymised logs from actual deployments rather than generic reconstructions.
Load-bearing premise
The generic frontier-lab coding-agent scenario reconstructed from public materials is representative of real deployments and the chosen methods can validly surface risks missed by model-level evaluations.
What would settle it
A documented real-world internal AI coding deployment in which all governance responsibilities prove externally verifiable, control actions remain effective despite monitoring delays, and safeguards show no measurable erosion from operational variability.
Figures
read the original abstract
Internal deployment of agentic AI systems for coding and research creates a sociotechnical control problem that extends beyond model behaviour. We treat internal-deployment Loss of Control as the inability to reliably constrain, audit, reverse, or halt AI-mediated changes to code, infrastructure, evaluation, or deployment processes in time to prevent serious organisational or societal harms. We ask whether established systems-safety methods can identify risks that model-level evaluations may miss. Using a generic frontier-lab coding-agent scenario reconstructed from public materials, we apply STECA, STPA, and FRAM. The analyses surface complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make otherwise valid control actions ineffective; and routine operational variability can gradually erode the calibration and independence of safeguards. We argue that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that internal deployment of agentic AI coding systems creates a sociotechnical Loss of Control problem extending beyond model behavior, defined as inability to reliably constrain, audit, reverse or halt AI-mediated changes. It reconstructs a generic frontier-lab coding-agent scenario from public materials and applies STECA, STPA and FRAM, surfacing three complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can render otherwise valid controls ineffective; and routine operational variability can erode safeguard calibration and independence. The authors conclude that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and ongoing operational assurance.
Significance. If the central claim holds under scrutiny of the scenario, the work demonstrates a concrete, non-circular application of established systems-safety methods to an AI deployment context and explicitly credits the identification of governance, delay and erosion risks as complementary to model-level work. This could strengthen arguments for operational assurance practices that track control effectiveness over time rather than relying solely on static evaluations.
major comments (2)
- [Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.
- [Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.
minor comments (2)
- Define STECA, STPA and FRAM on first use and provide a brief one-sentence reminder of each method's core purpose when first applied to the scenario.
- Clarify whether the three methods were applied independently or in sequence, and whether any cross-validation between their outputs was performed.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important areas for strengthening the manuscript. We address each major comment below, indicating planned revisions where appropriate. The responses focus on clarifying the scope of the generic scenario and enhancing methodological transparency without altering the core claims.
read point-by-point responses
-
Referee: [Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.
Authors: The scenario is explicitly presented as a generic reconstruction drawn from publicly available materials on frontier-lab practices, not as a claim about any specific organization's internal controls. Its purpose is to demonstrate how systems methods can surface potential governance gaps based on what external observers can verify. We agree that the absence of sensitivity analysis over variants (e.g., additional internal monitoring or override paths) is a limitation that could affect interpretation of the findings. In revision we will add a dedicated limitations subsection that enumerates plausible variants, discusses how each might alter the identified unverifiable loops or delay risks, and reiterates that the analysis concerns external verifiability rather than definitive internal risk. This will make the scope and evidential basis clearer. revision: yes
-
Referee: [Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.
Authors: We concur that explicit, reproducible methodological steps are required. The current manuscript describes the application at a high level but does not provide the detailed elicitation process or variability instantiation. In the revised version we will expand the Methods section with step-by-step accounts: for STECA and STPA, the sources and criteria used to construct each control structure element; for FRAM, the specific variability scenarios instantiated for the coding-agent functions and the rationale for each. These additions will enable independent readers to trace, reproduce, or challenge the governance and delay findings. revision: yes
Circularity Check
No circularity: external methods applied to reconstructed scenario
full rationale
The paper reconstructs a generic frontier-lab coding-agent scenario from public materials and applies three pre-existing, independently developed systems-safety methods (STECA, STPA, FRAM) to it. The surfaced findings on governance, delays, and erosion are direct outputs of those established analytic frameworks rather than quantities fitted to the scenario or derived from self-referential definitions. No equations, fitted parameters, or self-citations appear as load-bearing steps in the provided derivation chain; the central claim therefore remains an application of external tools and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and Hedley, P
Adler, S. and Hedley, P. Guidelight AI Standards , May 2026. URL https://www.guidelight.ai/control
2026
-
[2]
Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025
Anthropic . Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025. URL https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report.pdf
2025
-
[3]
Automated weak-to-strong researcher, 2026 a
Anthropic . Automated weak-to-strong researcher, 2026 a . URL https://alignment.anthropic.com/2026/automated-w2s-researcher/
2026
-
[4]
Alignment risk update: Claude mythos preview, April 2026 b
Anthropic . Alignment risk update: Claude mythos preview, April 2026 b . URL https://anthropic.com/claude-mythos-preview-risk-report
2026
-
[5]
Sabotage risk report: Claude opus 4.6, February 2026 c
Anthropic . Sabotage risk report: Claude opus 4.6, February 2026 c . URL https://anthropic.com/feb-2026-risk-report
2026
-
[6]
Responsible Scaling Policy , version 3.0, February 2026 d
Anthropic . Responsible Scaling Policy , version 3.0, February 2026 d . URL https://anthropic.com/responsible-scaling-policy/rsp-v3-0
2026
-
[7]
System card: Claude mythos preview, April 2026 e
Anthropic . System card: Claude mythos preview, April 2026 e . URL https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf
2026
-
[8]
arXiv preprint arXiv:2512.08864(2025)
Barrett, S., Murray, M., Quarks, O., Smith, M., Kryś, J., Campos, S., Boria, A. T., Touzet, C., Hayrapet, S., Heiding, F., and others . Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse . arXiv preprint arXiv:2512.08864, 2025
-
[9]
P., Rhodes, C., and Vergani, S
Barrett, S., Bruvere, A., Fillingham, S. P., Rhodes, C., and Vergani, S. STAMP / STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems , February 2026. URL http://arxiv.org/abs/2512.17600. arXiv:2512.17600 [cs]
-
[10]
International AI safety report 2026
Bengio, Y., Murray, M., Prunkl, C., and others . International AI safety report 2026. Independent Scientific Report DSIT 2026/001, Department for Science, Innovation and Technology, February 2026. URL https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026. arXiv: 2602.21012
-
[11]
Boudreaux, B., Vermeer, M. J. D., Horton, K., and Kalra, N. The Case for AI Loss of Control Response Planning and an Outline to Get Started . Expert Insights , RAND Corporation, October 2025. URL https://www.rand.org/pubs/perspectives/PEA4232-1.html
2025
-
[12]
California Senate Bill no
California Legislature . California Senate Bill no. 53, Transparency in Frontier Artificial Intelligence Act , 2025. URL https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53
2025
-
[13]
Campos, S., Papadatos, H., Roger, F., Touzet, C., Quarks, O., and Murray, M. A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025. URL https://arxiv.org/abs/2502.06656
-
[14]
Risk Reporting for Developers ' Internal AI Model Use , April 2025
Delaney, O., Maheshwari, S., O'Brien, J., Bearman, T., and Guest, O. Risk Reporting for Developers ' Internal AI Model Use , April 2025. URL https://www.iaps.ai/research/risk-reporting-for-developers-internal-ai-model-use
2025
-
[15]
Code of Practice for General - Purpose AI Models : Transparency Chapter
European Commission . Code of Practice for General - Purpose AI Models : Transparency Chapter . Technical report, AI Office, European Commission, 2024. URL https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai
2024
-
[16]
Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024
European Union . Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024. URL https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
2024
-
[17]
Fleming, C. H. Safety-driven early concept analysis and development. Thesis, Massachusetts Institute of Technology, 2015. URL https://dspace.mit.edu/handle/1721.1/97352
2015
-
[18]
Frontier safety framework, version 3.1, September 2025
Google DeepMind . Frontier safety framework, version 3.1, September 2025. URL https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf
2025
-
[19]
FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems
Hollnagel, E. FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems . Crc Press, Boca Raton, 2012. ISBN 978-1-4094-4551-7
2012
-
[20]
Koessler, L. and Schuett, J. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries, July 2023. URL http://arxiv.org/abs/2307.08823
-
[21]
and Thomas, J
Leveson, N. and Thomas, J. STPA handbook. Technical report, MIT Partnership for Systems Approaches to Safety and Security (PSASS), Cambridge, Massachusetts, U.S., 2018
2018
-
[22]
Leveson, N. G. Engineering a Safer World : Systems Thinking Applied to Safety . MIT Press, 2012. URL https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied
2012
-
[23]
Common elements of frontier AI safety policies, December 2025
METR . Common elements of frontier AI safety policies, December 2025. URL https://metr.org/common-elements.pdf
2025
-
[24]
MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024
MITRE Corporation . MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024. URL https://attack.mitre.org/
2024
-
[25]
Systematic Hazard Analysis for Frontier AI using STPA , June 2025
Mylius, S. Systematic Hazard Analysis for Frontier AI using STPA , June 2025. URL http://arxiv.org/abs/2506.01782. Myl\_Systematic\_2025 arXiv:2506.01782 [cs] version: 1
-
[26]
Preparedness Framework , version 2.0, April 2025
OpenAI . Preparedness Framework , version 2.0, April 2025. URL https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
2025
-
[27]
How we monitor internal coding agents for misalignment, March 2026
OpenAI . How we monitor internal coding agents for misalignment, March 2026. URL https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/
2026
-
[28]
From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024
Rismani, S., Dobbe, R., and Moon, A. From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024. URL http://arxiv.org/abs/2410.22526. arXiv:2410.22526 [cs]
-
[29]
System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025
SAE International . System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025. URL https://www.sae.org/standards/j3307_202503-system-theoretic-process-analysis-stpa-standard-industries
2025
-
[30]
Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents
Somani, E., Friedman, A., Wu, H., Lu, M., Byrd, C., van Soest, H., and Zakaria, S. Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents . Research Report , RAND Corporation, July 2025. URL https://www.rand.org/pubs/research_reports/RRA3847-1.html
2025
-
[31]
The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025
Stix, C., Hallensleben, A., Ortega, A., and Pistillo, M. The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025. URL http://arxiv.org/abs/2511.15846. stixLossControlPlaybook2025 arXiv:2511.15846 [cs] version: 5
-
[32]
STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024
Thomas, J. STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024. URL https://psas.scripts.mit.edu/home/wp-content/uploads/2024/STPA-Scenarios-New-Approach.pdf
2024
-
[33]
Tkeshelashvili, M., Verma, R., and Kelly, S. M. AI Loss of Control Risk : Indications & Warning , 2026. URL https://securityandtechnology.org/virtual-library/report/ai-loss-of-control-risk-indications-warning/
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.