pith. sign in

arxiv: 2606.13474 · v2 · pith:4CRZCGYRnew · submitted 2026-06-11 · 💻 cs.CY

Exploring Systems-Thinking Approaches to Loss of Control Risk

Pith reviewed 2026-06-30 11:21 UTC · model grok-4.3

classification 💻 cs.CY
keywords loss of controlsystems safetyagentic AIinternal deploymentAI governanceSTPAFRAMsociotechnical control
0
0 comments X

The pith

Systems-safety methods show AI governance frameworks often leave responsibilities unverifiable and safeguards prone to erosion by delays and variability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reconstructs a generic frontier-lab scenario for internal deployment of agentic AI coding systems and applies three established systems-safety methods to it. The analyses identify that published control frameworks can leave governance duties and feedback loops externally unverifiable, that monitoring delays can nullify otherwise sound interventions, and that normal operational changes can gradually weaken safeguard calibration and independence. A reader would care because this implies that model-level checks alone may fail to catch sociotechnical risks in how AI changes to code and infrastructure are actually constrained in practice. The authors conclude that risk management should combine model evaluations with systems-level hazard analysis and ongoing assurance that controls stay effective over time.

Core claim

Treating loss of control as the inability to reliably constrain, audit, reverse or halt AI-mediated changes in time, the application of STECA, STPA and FRAM to the scenario surfaces three complementary gaps: published frameworks leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make valid control actions ineffective; and routine operational variability can erode the calibration and independence of safeguards. Frontier-AI risk management should therefore pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.

What carries the argument

Application of STECA, STPA and FRAM to a reconstructed internal coding-agent deployment scenario to surface sociotechnical control gaps beyond model behaviour.

If this is right

  • Governance responsibilities and feedback loops must be made externally verifiable to prevent loss of control.
  • Control actions must incorporate explicit accounting for monitoring and intervention delays to stay effective.
  • Safeguards require periodic re-calibration to counter gradual erosion from routine operational variability.
  • Risk management must track ongoing effectiveness of controls rather than assuming initial design sufficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar systems analyses could be applied to other internal agentic AI uses such as research or data pipelines.
  • Organisations may need new operational metrics to detect erosion of safeguards before harm occurs.
  • The findings suggest value in testing these methods on anonymised logs from actual deployments rather than generic reconstructions.

Load-bearing premise

The generic frontier-lab coding-agent scenario reconstructed from public materials is representative of real deployments and the chosen methods can validly surface risks missed by model-level evaluations.

What would settle it

A documented real-world internal AI coding deployment in which all governance responsibilities prove externally verifiable, control actions remain effective despite monitoring delays, and safeguards show no measurable erosion from operational variability.

Figures

Figures reproduced from arXiv: 2606.13474 by Aurelio Carlucci, Jakub Kry\'s, James Walpole, Sean P. Fillingham.

Figure 1
Figure 1. Figure 1: Functional control structure for internal deployment of a frontier AI coding agent. Arrows pointing downwards denote Control Actions, while those pointing upwards denote Feedback to the controllers. 5.3. Unsafe Control Actions and Loss Scenarios Unsafe control actions and loss scenarios can be systemat￾ically determined using STPA (see Appendix E.5 and Ap￾pendix E.6, respectively, for our complete list). H… view at source ↗
Figure 2
Figure 2. Figure 2: FRAM coupling diagram for internal deployment of a frontier AI coding agent. Lines between functions represent dependencies: the output of one function serves as an input, control, or resource for another. Background function B2 (policies and permissions) controls F1–F4 simultaneously; B1 (infrastructure) provides resources to F2 and F4. Second, B2’s outputs shape every foreground function at once. B2 prov… view at source ↗
read the original abstract

Internal deployment of agentic AI systems for coding and research creates a sociotechnical control problem that extends beyond model behaviour. We treat internal-deployment Loss of Control as the inability to reliably constrain, audit, reverse, or halt AI-mediated changes to code, infrastructure, evaluation, or deployment processes in time to prevent serious organisational or societal harms. We ask whether established systems-safety methods can identify risks that model-level evaluations may miss. Using a generic frontier-lab coding-agent scenario reconstructed from public materials, we apply STECA, STPA, and FRAM. The analyses surface complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can make otherwise valid control actions ineffective; and routine operational variability can gradually erode the calibration and independence of safeguards. We argue that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and operational assurance that tracks whether controls remain effective over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that internal deployment of agentic AI coding systems creates a sociotechnical Loss of Control problem extending beyond model behavior, defined as inability to reliably constrain, audit, reverse or halt AI-mediated changes. It reconstructs a generic frontier-lab coding-agent scenario from public materials and applies STECA, STPA and FRAM, surfacing three complementary findings: published frameworks can leave governance responsibilities and feedback loops externally unverifiable; delays in monitoring and intervention can render otherwise valid controls ineffective; and routine operational variability can erode safeguard calibration and independence. The authors conclude that frontier-AI risk management should pair model-focused evaluations with systems-level hazard analysis and ongoing operational assurance.

Significance. If the central claim holds under scrutiny of the scenario, the work demonstrates a concrete, non-circular application of established systems-safety methods to an AI deployment context and explicitly credits the identification of governance, delay and erosion risks as complementary to model-level work. This could strengthen arguments for operational assurance practices that track control effectiveness over time rather than relying solely on static evaluations.

major comments (2)
  1. [Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.
  2. [Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.
minor comments (2)
  1. Define STECA, STPA and FRAM on first use and provide a brief one-sentence reminder of each method's core purpose when first applied to the scenario.
  2. Clarify whether the three methods were applied independently or in sequence, and whether any cross-validation between their outputs was performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the manuscript. We address each major comment below, indicating planned revisions where appropriate. The responses focus on clarifying the scope of the generic scenario and enhancing methodological transparency without altering the core claims.

read point-by-point responses
  1. Referee: [Scenario description and application sections] The load-bearing assumption is the representativeness of the generic frontier-lab coding-agent scenario reconstructed from public materials. The central claim that STECA/STPA/FRAM surface risks missed by model evaluations requires that omitted internal monitoring, access controls and override paths do not already close the loops flagged as unverifiable. No sensitivity analysis over plausible scenario variants is reported, leaving open whether the findings are reconstruction artifacts.

    Authors: The scenario is explicitly presented as a generic reconstruction drawn from publicly available materials on frontier-lab practices, not as a claim about any specific organization's internal controls. Its purpose is to demonstrate how systems methods can surface potential governance gaps based on what external observers can verify. We agree that the absence of sensitivity analysis over variants (e.g., additional internal monitoring or override paths) is a limitation that could affect interpretation of the findings. In revision we will add a dedicated limitations subsection that enumerates plausible variants, discusses how each might alter the identified unverifiable loops or delay risks, and reiterates that the analysis concerns external verifiability rather than definitive internal risk. This will make the scope and evidential basis clearer. revision: yes

  2. Referee: [Methods and analysis sections] The abstract asserts that the three analyses were performed and produced the listed findings, yet the manuscript must supply explicit methodological steps (e.g., how control structures were elicited, how FRAM variability was instantiated for the coding-agent case) sufficient for an independent reader to reproduce or falsify the surfaced governance and delay claims.

    Authors: We concur that explicit, reproducible methodological steps are required. The current manuscript describes the application at a high level but does not provide the detailed elicitation process or variability instantiation. In the revised version we will expand the Methods section with step-by-step accounts: for STECA and STPA, the sources and criteria used to construct each control structure element; for FRAM, the specific variability scenarios instantiated for the coding-agent functions and the rationale for each. These additions will enable independent readers to trace, reproduce, or challenge the governance and delay findings. revision: yes

Circularity Check

0 steps flagged

No circularity: external methods applied to reconstructed scenario

full rationale

The paper reconstructs a generic frontier-lab coding-agent scenario from public materials and applies three pre-existing, independently developed systems-safety methods (STECA, STPA, FRAM) to it. The surfaced findings on governance, delays, and erosion are direct outputs of those established analytic frameworks rather than quantities fitted to the scenario or derived from self-referential definitions. No equations, fitted parameters, or self-citations appear as load-bearing steps in the provided derivation chain; the central claim therefore remains an application of external tools and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described. The central claim rests on the unstated assumption that the reconstructed scenario and chosen methods are appropriate, but these are not formalized in the provided text.

pith-pipeline@v0.9.1-grok · 5694 in / 1144 out tokens · 28008 ms · 2026-06-30T11:21:02.886787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 8 canonical work pages

  1. [1]

    and Hedley, P

    Adler, S. and Hedley, P. Guidelight AI Standards , May 2026. URL https://www.guidelight.ai/control

  2. [2]

    Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025

    Anthropic . Anthropic’s Summer 2025 Pilot Sabotage Risk Report , 2025. URL https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report.pdf

  3. [3]

    Automated weak-to-strong researcher, 2026 a

    Anthropic . Automated weak-to-strong researcher, 2026 a . URL https://alignment.anthropic.com/2026/automated-w2s-researcher/

  4. [4]

    Alignment risk update: Claude mythos preview, April 2026 b

    Anthropic . Alignment risk update: Claude mythos preview, April 2026 b . URL https://anthropic.com/claude-mythos-preview-risk-report

  5. [5]

    Sabotage risk report: Claude opus 4.6, February 2026 c

    Anthropic . Sabotage risk report: Claude opus 4.6, February 2026 c . URL https://anthropic.com/feb-2026-risk-report

  6. [6]

    Responsible Scaling Policy , version 3.0, February 2026 d

    Anthropic . Responsible Scaling Policy , version 3.0, February 2026 d . URL https://anthropic.com/responsible-scaling-policy/rsp-v3-0

  7. [7]

    System card: Claude mythos preview, April 2026 e

    Anthropic . System card: Claude mythos preview, April 2026 e . URL https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf

  8. [8]

    arXiv preprint arXiv:2512.08864(2025)

    Barrett, S., Murray, M., Quarks, O., Smith, M., Kryś, J., Campos, S., Boria, A. T., Touzet, C., Hayrapet, S., Heiding, F., and others . Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse . arXiv preprint arXiv:2512.08864, 2025

  9. [9]

    P., Rhodes, C., and Vergani, S

    Barrett, S., Bruvere, A., Fillingham, S. P., Rhodes, C., and Vergani, S. STAMP / STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems , February 2026. URL http://arxiv.org/abs/2512.17600. arXiv:2512.17600 [cs]

  10. [10]

    International AI safety report 2026

    Bengio, Y., Murray, M., Prunkl, C., and others . International AI safety report 2026. Independent Scientific Report DSIT 2026/001, Department for Science, Innovation and Technology, February 2026. URL https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026. arXiv: 2602.21012

  11. [11]

    Boudreaux, B., Vermeer, M. J. D., Horton, K., and Kalra, N. The Case for AI Loss of Control Response Planning and an Outline to Get Started . Expert Insights , RAND Corporation, October 2025. URL https://www.rand.org/pubs/perspectives/PEA4232-1.html

  12. [12]

    California Senate Bill no

    California Legislature . California Senate Bill no. 53, Transparency in Frontier Artificial Intelligence Act , 2025. URL https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53

  13. [13]

    A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025

    Campos, S., Papadatos, H., Roger, F., Touzet, C., Quarks, O., and Murray, M. A Frontier AI Risk Management Framework : Bridging the Gap Between Current AI Practices and Established Risk Management , 2025. URL https://arxiv.org/abs/2502.06656

  14. [14]

    Risk Reporting for Developers ' Internal AI Model Use , April 2025

    Delaney, O., Maheshwari, S., O'Brien, J., Bearman, T., and Guest, O. Risk Reporting for Developers ' Internal AI Model Use , April 2025. URL https://www.iaps.ai/research/risk-reporting-for-developers-internal-ai-model-use

  15. [15]

    Code of Practice for General - Purpose AI Models : Transparency Chapter

    European Commission . Code of Practice for General - Purpose AI Models : Transparency Chapter . Technical report, AI Office, European Commission, 2024. URL https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai

  16. [16]

    Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024

    European Union . Regulation ( EU ) 2024/1689 of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act), 2024. URL https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

  17. [17]

    Fleming, C. H. Safety-driven early concept analysis and development. Thesis, Massachusetts Institute of Technology, 2015. URL https://dspace.mit.edu/handle/1721.1/97352

  18. [18]

    Frontier safety framework, version 3.1, September 2025

    Google DeepMind . Frontier safety framework, version 3.1, September 2025. URL https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf

  19. [19]

    FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems

    Hollnagel, E. FRAM : The Functional Resonance Analysis Method : Modelling Complex Socio -technical Systems . Crc Press, Boca Raton, 2012. ISBN 978-1-4094-4551-7

  20. [20]

    and Schuett, J

    Koessler, L. and Schuett, J. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries, July 2023. URL http://arxiv.org/abs/2307.08823

  21. [21]

    and Thomas, J

    Leveson, N. and Thomas, J. STPA handbook. Technical report, MIT Partnership for Systems Approaches to Safety and Security (PSASS), Cambridge, Massachusetts, U.S., 2018

  22. [22]

    Leveson, N. G. Engineering a Safer World : Systems Thinking Applied to Safety . MIT Press, 2012. URL https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied

  23. [23]

    Common elements of frontier AI safety policies, December 2025

    METR . Common elements of frontier AI safety policies, December 2025. URL https://metr.org/common-elements.pdf

  24. [24]

    MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024

    MITRE Corporation . MITRE ATT & CK : Adversarial Tactics , Techniques , and Common Knowledge , 2024. URL https://attack.mitre.org/

  25. [25]

    Systematic Hazard Analysis for Frontier AI using STPA , June 2025

    Mylius, S. Systematic Hazard Analysis for Frontier AI using STPA , June 2025. URL http://arxiv.org/abs/2506.01782. Myl\_Systematic\_2025 arXiv:2506.01782 [cs] version: 1

  26. [26]

    Preparedness Framework , version 2.0, April 2025

    OpenAI . Preparedness Framework , version 2.0, April 2025. URL https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf

  27. [27]

    How we monitor internal coding agents for misalignment, March 2026

    OpenAI . How we monitor internal coding agents for misalignment, March 2026. URL https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

  28. [28]

    From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024

    Rismani, S., Dobbe, R., and Moon, A. From Silos to Systems : Process - Oriented Hazard Analysis for AI Systems , October 2024. URL http://arxiv.org/abs/2410.22526. arXiv:2410.22526 [cs]

  29. [29]

    System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025

    SAE International . System Theoretic Process Analysis ( STPA ) Standard For All Industries , March 2025. URL https://www.sae.org/standards/j3307_202503-system-theoretic-process-analysis-stpa-standard-industries

  30. [30]

    Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents

    Somani, E., Friedman, A., Wu, H., Lu, M., Byrd, C., van Soest, H., and Zakaria, S. Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents . Research Report , RAND Corporation, July 2025. URL https://www.rand.org/pubs/research_reports/RRA3847-1.html

  31. [31]

    The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025

    Stix, C., Hallensleben, A., Ortega, A., and Pistillo, M. The Loss of Control Playbook : Degrees , Dynamics , and Preparedness , December 2025. URL http://arxiv.org/abs/2511.15846. stixLossControlPlaybook2025 arXiv:2511.15846 [cs] version: 5

  32. [32]

    STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024

    Thomas, J. STPA Step 4: Building Scenarios -- A Formal Scenario Approach , 2024. URL https://psas.scripts.mit.edu/home/wp-content/uploads/2024/STPA-Scenarios-New-Approach.pdf

  33. [33]

    Tkeshelashvili, M., Verma, R., and Kelly, S. M. AI Loss of Control Risk : Indications & Warning , 2026. URL https://securityandtechnology.org/virtual-library/report/ai-loss-of-control-risk-indications-warning/