arxiv: 2512.01166 · v5 · submitted 2025-12-01 · 💻 cs.CY

Evaluating AI Providers' Frontier Safety Frameworks

Lily Stelling , Malcolm Murray , Bruno Galizzi , Max Schaffelder , Sim\'eon Campos , Henry Papadatos This is my paper

Pith reviewed 2026-05-17 03:43 UTC · model grok-4.3

classification 💻 cs.CY

keywords frontier AI safetyAI risk managementsafety frameworksAI accountabilityrisk governancehigh-risk industries

0 comments

The pith

Twelve AI companies' safety frameworks score a median of 18% on risk management criteria drawn from aviation and nuclear industries, making them poor tools for external accountability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates twelve frontier AI safety frameworks published after the 2024 Seoul Summit against sixty-five weighted criteria spanning risk identification, analysis, treatment, and governance. These criteria are adapted from established practices in mature high-risk sectors. Scores range from thirty-four percent for Anthropic down to eight percent for Cohere, with a median of eighteen percent; many required elements remain missing or vaguely stated. The authors conclude that these documents cannot yet serve as reliable accountability mechanisms because outsiders cannot easily predict company actions, judge response adequacy, or verify follow-through. They also note that simply combining the strongest practices already in use across the set would raise the median to fifty-four percent.

Core claim

Current frontier AI safety frameworks contain vague commitments that limit their effectiveness as external accountability instruments, as shown by low scores across sixty-five criteria adapted from high-risk industries, though a company incorporating all leading practices already present among peers could reach triple the current median score.

What carries the argument

Sixty-five weighted criteria grouped into risk identification, risk analysis and evaluation, risk treatment, and risk governance, adapted from Campos et al. (2025) and high-risk industry standards.

If this is right

Vague language in the frameworks prevents outsiders from forecasting how companies will respond to specific advanced AI risks.
Assessing whether a company's planned mitigation is adequate becomes unreliable without clearer commitments.
Verifying whether companies have kept their own stated commitments is difficult under current wording.
Adopting the strongest elements already used by any peer could raise a typical framework's score from the median of eighteen percent to fifty-four percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Legislators treating these frameworks as compliance documents may need to require more standardized and specific language.
Companies could improve accountability scores quickly by copying the best current practices from one another rather than inventing new ones.
Over time, repeated evaluations using the same criteria might pressure the industry toward clearer risk management norms.

Load-bearing premise

The chosen criteria correctly identify the essential elements needed for responsible management of catastrophic AI risks.

What would settle it

A follow-up study that tracks whether companies with higher-scoring frameworks actually take materially different or more effective actions when new risks emerge.

read the original abstract

Following the AI Seoul Summit in 2024, twelve AI companies published frontier AI safety frameworks (Frameworks) outlining their approaches to managing catastrophic risks from advanced AI systems. Emerging legislation increasingly treats these Frameworks as external accountability mechanisms, incorporating them into reporting requirements. But what do the Frameworks actually commit each company to do? This study assesses 12 Frameworks, using 65 weighted criteria, across four dimensions: risk identification, risk analysis \& evaluation, risk treatment, and risk governance. Our criteria adapt established risk management principles from other high-risk industries (e.g. aviation, nuclear power) to the frontier AI context, following Campos et al. (2025). Overall scores range from 34% (Anthropic) to 8% (Cohere), with a median of 18%. Many aspects are missing or under-specified. These low scores may be natural given the nascency of AI risk management compared to industries with decades of practice. Nonetheless, current Frameworks are limited as accountability functions, with vague commitments that make it difficult to predict company decisions, assess whether planned responses are adequate, or determine whether commitments have been kept. Still, higher scores appear feasible within current constraints: a company adopting all leading practices currently adopted across their peers would score 54%, which is triple the current median.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates the frontier AI safety frameworks published by twelve AI companies following the 2024 AI Seoul Summit. Using 65 weighted criteria adapted from Campos et al. (2025) and risk-management principles from high-risk industries (aviation, nuclear), it scores the frameworks across four dimensions: risk identification, risk analysis & evaluation, risk treatment, and risk governance. Aggregate scores range from 34% (Anthropic) to 8% (Cohere) with a median of 18%. The authors conclude that current frameworks are limited as accountability functions because of vague commitments that impede prediction of company decisions, assessment of response adequacy, and verification of compliance, while noting that adopting all leading practices already present across peers could reach 54%.

Significance. If the 65 criteria validly identify the elements required for predictability, adequacy assessment, and verifiability in frontier AI, the paper supplies a timely, structured benchmark that could inform emerging legislation treating these frameworks as accountability mechanisms. The work is strengthened by its explicit identification of a feasible improvement trajectory (54%) and by the systematic transfer of established risk-management principles to a new domain. The result would be more significant if the criteria were shown to be necessary rather than merely aspirational for a domain whose risks remain less amenable to quantitative historical validation than aviation or nuclear power.

major comments (2)

[Methods] Methods section: the evaluation process is under-specified with respect to inter-rater reliability, the exact weighting procedure applied to the 65 criteria, and the handling of borderline cases. Without these details the link between raw framework text and the reported aggregate percentages (e.g., 34% for Anthropic) cannot be fully assessed.
[Criteria development and Discussion] Criteria development and Discussion sections: the manuscript adapts criteria from Campos et al. (2025) plus aviation/nuclear principles and notes that low scores may be expected given the nascency of AI risk management, yet still infers that the frameworks are limited as accountability functions. It does not demonstrate that the absent elements (specific quantitative risk metrics, treatment thresholds) are required for accountability rather than premature for frontier AI, where risks lack the historical data available in the source domains.

minor comments (2)

[Abstract and Results] The abstract and results tables would benefit from a brief statement of how many frameworks were scored by how many raters and whether any inter-rater agreement statistic was computed.
[References] Ensure the full bibliographic details and status (preprint vs. published) of Campos et al. (2025) appear in the reference list.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the methodological transparency and the scope of our claims about accountability. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Methods] Methods section: the evaluation process is under-specified with respect to inter-rater reliability, the exact weighting procedure applied to the 65 criteria, and the handling of borderline cases. Without these details the link between raw framework text and the reported aggregate percentages (e.g., 34% for Anthropic) cannot be fully assessed.

Authors: We agree that the Methods section requires greater specificity to allow full assessment of the scoring process. In the revised manuscript we will add: (1) details on inter-rater reliability, noting that two authors independently evaluated each framework against the criteria, with disagreements resolved by consensus and an inter-rater agreement rate reported; (2) the exact weighting procedure, explaining that weights were assigned proportionally to the emphasis in Campos et al. (2025) and standard risk-management frameworks, with equal weighting applied within each of the four dimensions; and (3) handling of borderline cases, including explicit examples of how ambiguous or conditional language in the frameworks was coded (typically conservatively as not fully meeting the criterion). These additions will make the connection between source text and aggregate scores transparent. revision: yes
Referee: [Criteria development and Discussion] Criteria development and Discussion sections: the manuscript adapts criteria from Campos et al. (2025) plus aviation/nuclear principles and notes that low scores may be expected given the nascency of AI risk management, yet still infers that the frameworks are limited as accountability functions. It does not demonstrate that the absent elements (specific quantitative risk metrics, treatment thresholds) are required for accountability rather than premature for frontier AI, where risks lack the historical data available in the source domains.

Authors: We appreciate the referee's distinction between aspirational and necessary elements. Our core claim is that accountability functions—predicting decisions, assessing response adequacy, and verifying compliance—logically require commitments that are specific enough to be evaluated, regardless of domain-specific data availability. We ground this in the risk-management literature rather than claiming empirical necessity from historical AI incidents. In the revised Discussion we will (a) more explicitly separate the logical requirements for accountability from domain-specific implementation details, (b) acknowledge that quantitative metrics may be premature and that clear qualitative thresholds could suffice initially, and (c) note that the observed vagueness prevents even basic accountability functions today. We will not overstate the criteria as universally required but will strengthen the argument that current frameworks fall short of the minimum specificity needed for the accountability role assigned by emerging legislation. revision: partial

Circularity Check

0 steps flagged

No significant circularity: original scoring of 12 frameworks is independent of criteria source

full rationale

The paper performs an empirical evaluation by applying 65 weighted criteria to 12 newly published frontier AI safety frameworks. The criteria are adapted from high-risk industry principles following Campos et al. (2025), with author overlap noted, but this is a standard methodological citation rather than a load-bearing reduction. No equations, fitted parameters presented as predictions, self-definitional loops, or uniqueness theorems are present. The central result—the specific scores, median of 18%, and comparison to a 54% peer-leading benchmark—is generated from direct analysis of the target documents and does not reduce to the prior work by construction. The paper explicitly acknowledges nascency effects, confirming the assessment is not tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The evaluation rests on the transferability of risk-management principles from aviation and nuclear sectors to frontier AI, plus the specific weighting and selection of the 65 criteria. No new physical entities are postulated.

free parameters (1)

criteria weights
The 65 criteria are described as weighted, but the abstract gives no information on how the weights were derived or validated.

axioms (1)

domain assumption Risk-management principles developed for aviation and nuclear power are applicable to frontier AI catastrophic-risk governance
The paper states that its criteria adapt these principles following Campos et al. (2025).

pith-pipeline@v0.9.0 · 5538 in / 1332 out tokens · 42837 ms · 2026-05-17T03:43:15.873025+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a 65-criteria assessment methodology grounded in established risk management principles from safety-critical industries... across four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

No company explicitly and quantitatively states the maximum risk level they will impose on society.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

2025 , howpublished =

International. 2025 , howpublished =

work page 2025
[2]

2024 , howpublished =

Issue Brief: Components of Frontier. 2024 , howpublished =

work page 2024
[3]

2024 , howpublished =

Common Elements of. 2024 , howpublished =

work page 2024
[4]

Emerging Practices in Frontier

Buhl, Marie Davidsen and Bucknall, Ben and Masterson, Tammy , journal=. Emerging Practices in Frontier

work page
[5]

A grading rubric for

Alaga, Jide and Schuett, Jonas and Anderljung, Markus , journal=. A grading rubric for

work page
[6]

A frontier

Campos, Sim. A frontier. arXiv preprint arXiv:2502.06656 , year=

work page arXiv
[7]

Updating Our Preparedness Framework , year =

work page
[8]

Anthropic’s Responsible Scaling Policy , year =

work page
[9]

2025 , howpublished =

General‑Purpose. 2025 , howpublished =

work page 2025
[10]

2023 , howpublished =

Artificial Intelligence Risk Management Framework (. 2023 , howpublished =

work page 2023
[11]

2025 , howpublished =

Our Approach to Frontier. 2025 , howpublished =

work page 2025
[12]

Strengthening Our Frontier Safety Framework , year =

work page
[13]

2025 , howpublished =

Common Elements of Frontier. 2025 , howpublished =

work page 2025
[14]

International AI safety report.arXiv preprint arXiv:2501.17805,

Bengio, Yoshua and Mindermann, S. International. arXiv preprint arXiv:2501.17805 , year=

work page arXiv
[15]

International

Bengio, Yoshua and Clare, Stephen and Prunkl, Carina and Andriushchenko, Maksym and Bucknall, Ben and Murray, Malcolm and Bommasani, Rishi and Casper, Stephen and Davidson, Tom and Douglas, Raymond and others , journal=. International

work page
[16]

2025 IEEE 64th Conference on Decision and Control (CDC) , pages=

Nonlinear robust optimization for planning and control , author=. 2025 IEEE 64th Conference on Decision and Control (CDC) , pages=. 2025 , organization=

work page 2025
[17]

Frontier

Anderljung, Markus and Barnhart, Joslyn and Korinek, Anton and Leung, Jade and O'Keefe, Cullen and Whittlestone, Jess and Avin, Shahar and Brundage, Miles and Bullock, Justin and Cass-Beggs, Duncan and others , journal=. Frontier

work page
[18]

2025 , howpublished =

Frontier. 2025 , howpublished =

work page 2025
[19]

Safety Science , volume=

Effects of foggy conditions on drivers’ speed control behaviors at different risk levels , author=. Safety Science , volume=. 2014 , publisher=

work page 2014
[20]

Brown, and Francis Rhys Ward

Van Der Weij, Teun and Hofst. arXiv preprint arXiv:2406.07358 , year=

work page arXiv
[21]

Safety cases: How to justify the safety of advanced

Clymer, Joshua and Gabrieli, Nick and Krueger, David and Larsen, Thomas , journal=. Safety cases: How to justify the safety of advanced

work page
[22]

Towards evaluations-based safety cases for

Balesni, Mikita and Hobbhahn, Marius and Lindner, David and Meinke, Alexander and Korbak, Tomek and Clymer, Joshua and Shlegeris, Buck and Scheurer, J. Towards evaluations-based safety cases for. arXiv preprint arXiv:2411.03336 , year=

work page arXiv
[23]

2025 , howpublished =

The General-Purpose. 2025 , howpublished =

work page 2025
[24]

SB 53 (Transparency in Frontier Artificial Intelligence Act) , year =

work page
[25]

Academy of management journal , volume=

Corporate social performance and organizational attractiveness to prospective employees , author=. Academy of management journal , volume=. 1997 , publisher=

work page 1997
[26]

International Family Planning Perspectives , pages=

Sexual activity and contraceptive knowledge and use among in-school adolescents in Nigeria , author=. International Family Planning Perspectives , pages=. 1997 , publisher=

work page 1997
[27]

C. B. Bhattacharya and Sankar Sen and Daniel Korschun , title =. 2008 , howpublished =

work page 2008
[28]

2023 , howpublished =

work page 2023
[29]

Managing extreme

Bengio, Yoshua and Hinton, Geoffrey and Yao, Andrew and Song, Dawn and Abbeel, Pieter and Darrell, Trevor and Harari, Yuval Noah and Zhang, Ya-Qin and Xue, Lan and Shalev-Shwartz, Shai and others , journal=. Managing extreme. 2024 , publisher=

work page 2024
[30]

Towards best practices in

Schuett, Jonas and Dreksler, Noemi and Anderljung, Markus and McCaffary, David and Heim, Lennart and Bluemke, Emma and Garfinkel, Ben , journal=. Towards best practices in

work page
[31]

2023 , howpublished =

Emerging Processes for Frontier. 2023 , howpublished =

work page 2023
[32]

2024 , howpublished =

Ben Robinson and James Ginns , title =. 2024 , howpublished =

work page 2024
[33]

2024 , howpublished =

Atoosa Kasirzadeh , title =. 2024 , howpublished =

work page 2024
[34]

arXiv preprint arXiv:2501.16500 , year=

Towards Frontier Safety Policies Plus , author=. arXiv preprint arXiv:2501.16500 , year=

work page arXiv
[35]

2025 , howpublished =

Zach Stein-Perlman , title =. 2025 , howpublished =

work page 2025
[36]

2025 , howpublished =

work page 2025
[37]

Risk Management Ratings (Legacy) , year =

work page
[38]

2026 , howpublished =

Bill Anderson-Samways and Institute for AI Policy and Strategy , title =. 2026 , howpublished =

work page 2026
[39]

2026 , howpublished =

work page 2026
[40]

Responsible Scaling Policies

Is OpenAI’s Preparedness Framework better than its competitors’ “Responsible Scaling Policies”? A Comparative Analysis , year =

work page
[41]

2024 , howpublished =

Scaling. 2024 , howpublished =

work page 2024
[42]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned , author=. arXiv preprint arXiv:2209.07858 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

2025 , howpublished =

Beatrice Nolan , title =. 2025 , howpublished =

work page 2025
[44]

2026 , howpublished =

Buck Shlegeris , title =. 2026 , howpublished =

work page 2026
[45]

2025 , howpublished =

Garrison Lovely , title =. 2025 , howpublished =

work page 2025
[46]

Risk assessment at

Koessler, Leonie and Schuett, Jonas , journal=. Risk assessment at

work page
[47]

Risk Appetite and Tolerance , year =

work page
[48]

Is power-seeking

Carlsmith, Joseph , journal=. Is power-seeking

work page
[49]

Measurement challenges in

Kasirzadeh, Atoosa , journal=. Measurement challenges in

work page
[50]

2024 , howpublished =

Bill Drexel and Caleb Withers , title =. 2024 , howpublished =

work page 2024
[51]

arXiv preprint arXiv:2503.05628 , year=

Superintelligence strategy: Expert version , author=. arXiv preprint arXiv:2503.05628 , year=

work page arXiv
[52]

Marginal Risk Relative to What? Distinguishing Baselines in

Alaga, Jide and Chen, Michael , booktitle=. Marginal Risk Relative to What? Distinguishing Baselines in

work page
[53]

2025 , howpublished =

Risk Taxonomy and Thresholds for Frontier. 2025 , howpublished =

work page 2025
[54]

SB-53 Artificial Intelligence Models: Large Developers (Transparency in Frontier Artificial Intelligence Act) , year =

work page
[55]

Responsible Scaling Policy v3 , year =

work page
[56]

Naming and Shaming of Corporate Offenders , year =

work page
[57]

2018 , howpublished =

An. 2018 , howpublished =

work page 2018
[58]

Senate Bill 53: Artificial Intelligence Models: Large Developers , year =

work page