AI Researchers Must Help Lead Arms Control to Mitigate Military AI Risks

Jacob Benz; Ted Fujimoto

arxiv: 2606.11533 · v1 · pith:ZKS6CPYXnew · submitted 2026-06-10 · 💻 cs.CY · cs.AI· cs.ET· cs.LG

AI Researchers Must Help Lead Arms Control to Mitigate Military AI Risks

Ted Fujimoto , Jacob Benz This is my paper

Pith reviewed 2026-06-27 08:25 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.ETcs.LG

keywords military AIarms controlAI safetynuclear deterrenceverificationdiplomacyAI risksfrontier models

0 comments

The pith

AI researchers must take a leading role in advancing arms control research to minimize risk in military AI applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that AI researchers should expand their focus from long-term superintelligence concerns to the immediate risks of military AI systems. It points to growing investments by defense contractors and partnerships with AI firms as creating urgent needs for collaboration among military leaders, arms control experts, and researchers. Drawing on the history of nuclear deterrence, the authors claim that similar approaches can yield innovations in verification and diplomacy to reduce instability. They conclude that AI researchers are essential to leading the technical work that defines and addresses these risks, given the absence of reliable solutions so far.

Core claim

The paper claims that arms control has reduced past catastrophic risks, so lessons from nuclear deterrence can guide AI safety and security research toward innovations in verification and diplomacy, and that AI researchers must assist in leading the technical research that clearly defines and alleviates instability in military settings.

What carries the argument

The transfer of lessons from nuclear deterrence to guide AI safety research through innovations in verification and diplomacy, with AI researchers positioned to lead the technical efforts.

If this is right

Military AI deployments would face reduced instability through defined verification methods.
Diplomacy tools adapted from nuclear contexts would apply to regulating frontier AI in defense.
Collaboration among AI researchers, military leaders, and arms control experts would produce safer outcomes.
Near-term focus on current military AI applications would complement rather than replace long-term AI safety work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same researcher-led approach could apply to regulating other dual-use technologies in security contexts.
AI labs might need to allocate resources for policy and verification research alongside capability development.
International agreements on AI could require new monitoring techniques that researchers help design.

Load-bearing premise

That lessons from nuclear deterrence can be applied to guide technical work on military AI risks and that AI researchers are the ones positioned to lead it.

What would settle it

A case where military AI systems integrate advanced models, proceed without AI researcher leadership in arms control, and produce no measurable increase in instability or risk.

Figures

Figures reproduced from arXiv: 2606.11533 by Jacob Benz, Ted Fujimoto.

**Figure 1.** Figure 1: In this visual by Goychayev et al. (2017), deterring malicious actions means the State ensures that any would-be attacker believes that the cost of an attack will outweigh the benefits. To do this, it must be demonstrated to an adversary that its attacks are unlikely to achieve their objectives, or that the consequences for an attack (successful or not) will be unacceptably high. control. It is important t… view at source ↗

read the original abstract

The advancement of AI capabilities compels researchers and the public to be more aware of its potential worldwide impact. A pressing near-term concern is the regulation of military AI applications. Armament manufacturers and defense contractors are increasingly investing in AI capabilities and forging partnerships with AI companies, creating a burgeoning coalition that demands military leaders, arms control diplomacy experts, and AI researchers collaborate to ensure a safer future. While AI researchers often focus on the long-term implications of superintelligent AI, this approach may not adequately address the immediate challenges posed by AI in military applications. Success requires acknowledging and mitigating the emerging risks of frontier AI models that plan to be integrated into defense applications, like military AI systems. Arms control has reduced past catastrophic risks, so lessons learned from nuclear deterrence can guide AI safety and security research towards innovations in verification and diplomacy. AI researchers, however, must assist in leading the technical research that clearly defines and alleviates instability in military settings. Given these new responsibilities and the lack of sufficiently reliable solutions, we argue that AI researchers must take a leading role in advancing arms control research to minimize risk in military AI applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a policy call urging AI researchers to lead arms control on military AI by borrowing from nuclear lessons, but the analogy is asserted without tackling domain differences.

read the letter

Colleague,

This paper pushes the idea that AI researchers have to take the lead on arms control to handle military AI risks. It notes defense contractors linking up with AI firms and argues that nuclear-era arms control offers a template for verification and diplomacy work.

It gets credit for shifting attention to near-term military uses instead of only long-term superintelligence. The point about needing collaboration across military, diplomacy, and technical experts is straightforward and reasonable.

The soft spot is the nuclear parallel. The text states that lessons from deterrence can guide AI work on verification and diplomacy, but it does not look at the mismatches: AI models lack physical signatures, run on code that copies easily, change fast, and often have dual-use civilian roots. Without addressing those, the claim that AI researchers are positioned to supply the missing technical fixes stays as an assertion rather than a worked-out case.

No data, no mechanisms, no new frameworks. The piece is self-contained opinion drawing on general knowledge of past arms control.

It is aimed at readers already inside AI governance and security studies who want more advocacy on this topic. Technical AI researchers or anyone looking for evidence or derivations will not get much out of it.

I would not send this to peer review. It reads as commentary rather than research that needs referee time.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that AI researchers must take a leading role in advancing arms control research to minimize risks from military AI applications. It asserts that arms control has reduced past catastrophic risks and that lessons from nuclear deterrence can guide AI safety and security research toward innovations in verification and diplomacy, while noting that AI researchers' typical focus on long-term superintelligence may neglect near-term military integration challenges.

Significance. If the recommendation holds, the paper could help redirect attention within the AI community toward policy engagement on military applications, potentially fostering technical contributions to verification methods. The manuscript correctly flags the growing partnerships between AI firms and defense contractors as a development requiring cross-expertise collaboration.

major comments (1)

[Abstract] Abstract: The assertion that 'lessons learned from nuclear deterrence can guide AI safety and security research towards innovations in verification and diplomacy' is load-bearing for the central recommendation that AI researchers must lead this work, yet the text provides no examination of transferability. Nuclear mechanisms rely on physical warhead counting and on-site inspections, while AI systems involve model opacity, dual-use codebases, rapid iteration, and absence of physical signatures; without addressing these differences the recommendation remains conditional on an untested parallel.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying a key gap in the manuscript's central claim. The comment is well-taken: the abstract's assertion about lessons from nuclear deterrence is load-bearing yet lacks explicit discussion of transferability. We will revise the paper to address this directly.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'lessons learned from nuclear deterrence can guide AI safety and security research towards innovations in verification and diplomacy' is load-bearing for the central recommendation that AI researchers must lead this work, yet the text provides no examination of transferability. Nuclear mechanisms rely on physical warhead counting and on-site inspections, while AI systems involve model opacity, dual-use codebases, rapid iteration, and absence of physical signatures; without addressing these differences the recommendation remains conditional on an untested parallel.

Authors: We agree that the manuscript does not examine transferability and that this weakens the recommendation. The paper is a short position piece focused on the need for AI researchers to engage in arms control rather than a comparative analysis of regimes. In revision we will add a dedicated paragraph (likely in the introduction or a new subsection) that explicitly contrasts the two domains—acknowledging physical counting and inspections versus opacity, dual-use code, and rapid iteration—and then articulates which high-level lessons (e.g., the value of verifiable limits for crisis stability, the role of technical experts in designing monitoring regimes, and the importance of diplomatic channels) can still inform AI-specific work such as model auditing protocols, hardware attestation, or watermarking schemes. This addition will make the claim conditional on the parallels we identify rather than an unexamined analogy. revision: yes

Circularity Check

0 steps flagged

No significant circularity: policy argument draws on external historical knowledge

full rationale

The paper is a policy advocacy piece whose central claim—that AI researchers must lead arms control research—rests on the premise that nuclear deterrence lessons can inform AI verification and diplomacy. No equations, fitted parameters, self-citations, or derivations appear in the provided text. The argument treats historical arms control outcomes as independent external input rather than reducing any result to its own premises by construction, satisfying the criteria for a self-contained non-circular recommendation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No technical content, derivations, or empirical components; the paper is a normative position statement without free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5728 in / 982 out tokens · 16913 ms · 2026-06-27T08:25:11.364761+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

137 extracted references · 26 canonical work pages · 12 internal anchors

[1]

Position:

Riley Simmons-Edler and Ryan Paul Badman and Shayne Longpre and Kanaka Rajan , booktitle=. Position:. 2024 , url=

2024
[2]

and Abbott, S

Jenkins, I. and Abbott, S. and Armbruster, M. and Brandt, L. and Conklin, K. and Davies, D. and Etim, D. N. and Gillens, A. R. and Graham, J. J. and Green, C. and Herzog, S. and Powell, N. and Rumbaugh, W. and Salisbury, D. and Sanders-Zakre, A. and Toivanen, H. , title =. Project on Nuclear Issues: A Collection of Papers from the 2017 Conference Series a...

2017
[3]

Journal of Conflict Resolution , volume=

Under the umbrella: Nuclear crises, extended deterrence, and public opinion , author=. Journal of Conflict Resolution , volume=. 2022 , publisher=

2022
[4]

2018 , publisher=

Understanding deterrence , author=. 2018 , publisher=

2018
[5]

2013 , month = jun, day =

Anup Shah , title =. 2013 , month = jun, day =

2013
[6]

Technical workshop on safeguards, verification technologies, and other related experience , number=

Nuclear verification: what it is, how it works, the assurances it can provide , author=. Technical workshop on safeguards, verification technologies, and other related experience , number=
[7]

Strategic stability: contending interpretations , pages=

The origins of strategic stability: the United States and the threat of surprise attack , author=. Strategic stability: contending interpretations , pages=. 2013 , publisher=

2013
[8]

arXiv preprint arXiv:2603.01608 , year=

Evaluating and understanding scheming propensity in LLM agents , author=. arXiv preprint arXiv:2603.01608 , year=

work page arXiv
[9]

scheming

Lessons from a chimp: Ai" scheming" and the quest for ape language , author=. arXiv preprint arXiv:2507.03409 , year=

work page arXiv
[10]

Bogdan and Emmanuel Ameisen and James Chen and Dzmitry Kishylau and Adam Pearce and Julius Tarng and Alex Wu and Jeff Wu and Yang Zhang and Daniel M

Kit Fraser‑Taliente and Subhash Kantamneni and Euan Ong and Dan Mossing and Christina Lu and Paul C. Bogdan and Emmanuel Ameisen and James Chen and Dzmitry Kishylau and Adam Pearce and Julius Tarng and Alex Wu and Jeff Wu and Yang Zhang and Daniel M. Ziegler and Evan Hubinger and Joshua Batson and Jack Lindsey and Samuel Zimmerman and Samuel Marks , title...
[11]

Dumbacher , title =

Erin D. Dumbacher , title =. 2026 , month = feb, day =

2026
[12]

2026 , month = mar, day =

Michael Albertson , title =. 2026 , month = mar, day =

2026
[13]

and Allen, Keir and Benz, Jacob M

White, Helen and Tanner, Jennifer E. and Allen, Keir and Benz, Jacob M. and McOmish, Sarah and Simmons, Kevin L. , title =. 2012 , month =

2012
[14]

, author=

Remote Monitoring Systems/Remote Data Transmission for International Nuclear Safeguards. , author=. 2022 , institution=

2022
[15]

2016 , note =

Vincent Fournier and IAEA Office of Public Information and Communication , title =. 2016 , note =

2016
[16]

International Conference on Learning Representations , volume=

Tamper-resistant safeguards for open-weight llms , author=. International Conference on Learning Representations , volume=
[17]

Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications , url =

Simmons-Edler, Riley and Dong, Jean and Lushenko, Paul and Rajan, Kanaka and Badman, Ryan , booktitle =. Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications , url =
[18]

2026 , month = may, url =

New START Treaty , author =. 2026 , month = may, url =

2026
[19]

2026 , journal =

Mishra, Vibhu , title =. 2026 , journal =

2026
[20]

2026 , month = apr, url =

New START at a Glance , author =. 2026 , month = apr, url =

2026
[21]

CJADC2 Initiative , howpublished =
[22]

2025 , howpublished =

Defense Command and Control: Further Progress Hinges on Establishing a Comprehensive Framework , institution =. 2025 , howpublished =

2025
[23]

Multi-Domain Operations , howpublished =
[24]

Solving the Hidden Challenges of JADC2 , howpublished =
[25]

Joint All-Domain Command and Control (JADC2) Capabilities , howpublished =
[26]

Essential Guide to JADC2 , howpublished =
[27]

2022 , url =

Summary of the Joint–All Domain Command and Control (JADC2) Strategy , institution =. 2022 , url =

2022
[28]

Risk and Regulation of Artificial Intelligence in Nuclear Command , year =

Paul Dean and Chris Meserole and Helen Toner , url =. Risk and Regulation of Artificial Intelligence in Nuclear Command , year =
[29]

ACM computing surveys (CSUR) , volume=

A survey on bias and fairness in machine learning , author=. ACM computing surveys (CSUR) , volume=. 2021 , publisher=

2021
[30]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Open problems and fundamental limitations of reinforcement learning from human feedback , author=. arXiv preprint arXiv:2307.15217 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

AI Alignment: A Comprehensive Survey

Ai alignment: A comprehensive survey , author=. arXiv preprint arXiv:2310.19852 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

2022 , publisher=

Dataset shift in machine learning , author=. 2022 , publisher=

2022
[33]

Review of International Studies , pages=

Revisiting the ‘stability--instability paradox’in AI-enabled warfare: A modern-day Promethean tragedy under the nuclear shadow? , author=. Review of International Studies , pages=. 2024 , publisher=

2024
[34]

2021 , url =

Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union legislative acts , author =. 2021 , url =

2021
[35]

2023 , url =

Executive Order (E.O.) 14110 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence , author =. 2023 , url =

2023
[36]

2023 , url =

China’s AI Regulations and How They Get Made , author =. 2023 , url =

2023
[37]

2023 , translator =

Measures for the Management of Generative Artificial Intelligence Services (Translated) , author =. 2023 , translator =

2023
[38]

arXiv preprint arXiv:2307.04699 , year=

International institutions for advanced AI , author=. arXiv preprint arXiv:2307.04699 , year=

work page arXiv
[39]

Aligning

Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt , booktitle=. Aligning. 2021 , url=

2021
[40]

ACM Computing Surveys (CSUR) , volume=

A review on fairness in machine learning , author=. ACM Computing Surveys (CSUR) , volume=. 2022 , publisher=

2022
[41]

ACM Computing Surveys (CSUR) , volume=

Adversarial machine learning attacks and defense methods in the cyber security domain , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

2021
[42]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

2024 , url =

Anthropic , title =. 2024 , url =

2024
[45]

2024 , url =

Kylie Robison , title =. 2024 , url =

2024
[46]

Advances in neural information processing systems , volume=

Deep reinforcement learning from human preferences , author=. Advances in neural information processing systems , volume=
[47]

arXiv preprint arXiv:2312.14925 , year=

A survey of reinforcement learning from human feedback , author=. arXiv preprint arXiv:2312.14925 , year=

work page arXiv
[48]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[49]

NeurIPS 2022 Competition Track , pages=

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition , author=. NeurIPS 2022 Competition Track , pages=. 2023 , organization=

2022
[50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[51]

2023 , url =

Paul Christiano , title =. 2023 , url =

2023
[52]

Supervising strong learners by amplifying weak experts

Supervising strong learners by amplifying weak experts , author=. arXiv preprint arXiv:1810.08575 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[53]

AI safety via debate

AI safety via debate , author=. arXiv preprint arXiv:1805.00899 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Advances in neural information processing systems , volume=

Cooperative inverse reinforcement learning , author=. Advances in neural information processing systems , volume=
[55]

Forty-first International Conference on Machine Learning , year=

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision , author=. Forty-first International Conference on Machine Learning , year=
[56]

arXiv preprint arXiv:2503.05628 , year=

Superintelligence strategy: Expert version , author=. arXiv preprint arXiv:2503.05628 , year=

work page arXiv
[57]

Scalable agent alignment via reward modeling: a research direction

Scalable agent alignment via reward modeling: a research direction , author=. arXiv preprint arXiv:1811.07871 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

2023 , journal =

Kwan Yee Ng and Jason Zhou and Ben Murphy and Rogier Creemers and Hunter Dorwart , title =. 2023 , journal =

2023
[59]

Artificial Intelligence Leadership and Protect U.S

Anduril Partners with OpenAI to Advance U.S. Artificial Intelligence Leadership and Protect U.S. and Allied Forces , author =. 2024 , url =

2024
[60]

2023 , url =

What is Arms Control? , author =. 2023 , url =

2023
[61]

Journal of International Humanitarian Legal Studies , volume=

Innovation-proof global governance for military artificial intelligence?: How I learned to stop worrying, and love the bot , author=. Journal of International Humanitarian Legal Studies , volume=. 2019 , publisher=

2019
[62]

Contemporary Security Policy , volume=

How viable is international arms control for military artificial intelligence? Three lessons from nuclear weapons , author=. Contemporary Security Policy , volume=. 2019 , publisher=

2019
[63]

Nature , volume=

AI weapons: Russia’s war in Ukraine shows why the world must enact a ban , author=. Nature , volume=. 2023 , publisher=

2023
[64]

2024 , note =

Memorandum on Advancing the United States Leadership in Artificial Intelligence, Harnessing Artificial Intelligence to Fulfill National Security Objectives, and Fostering the Safety and Security , howpublished =. 2024 , note =

2024
[65]

International organization , volume=

The emergence of cooperation: national epistemic communities and the international evolution of the idea of nuclear arms control , author=. International organization , volume=. 1992 , publisher=

1992
[66]

2024 , booktitle=

Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback , author=. 2024 , booktitle=

2024
[67]

Review of international studies , volume=

Rethinking epistemic communities twenty years later , author=. Review of international studies , volume=. 2013 , publisher=

2013
[68]

Nature human behaviour , pages=

Large language models surpass human experts in predicting neuroscience results , author=. Nature human behaviour , pages=. 2024 , publisher=

2024
[69]

Exploring collaboration mechanisms for llm agents: A social psychology view,

Exploring collaboration mechanisms for llm agents: A social psychology view , author=. arXiv preprint arXiv:2310.02124 , year=

work page arXiv
[70]

Proceedings of the ACM on Human-Computer Interaction , volume=

Human-ai collaboration in cooperative games: A study of playing codenames with an llm assistant , author=. Proceedings of the ACM on Human-Computer Interaction , volume=. 2024 , publisher=

2024
[71]

IEEE Spectrum , volume=

False alarm, nuclear danger , author=. IEEE Spectrum , volume=. 2000 , publisher=

2000
[72]

European Journal of International Security , volume=

Inadvertent escalation in the age of intelligence machines: A new model for nuclear risk in the digital age , author=. European Journal of International Security , volume=. 2022 , publisher=

2022
[73]

2024 , booktitle=

Stealing part of a production language model , author=. 2024 , booktitle=

2024
[74]

arXiv preprint arXiv:2005.05909 , year=

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp , author=. arXiv preprint arXiv:2005.05909 , year=

work page arXiv 2005
[75]

arXiv preprint arXiv:2104.13733 , year=

Gradient-based adversarial attacks against text transformers , author=. arXiv preprint arXiv:2104.13733 , year=

work page arXiv
[76]

Advances in Neural Information Processing Systems , volume=

Jailbroken: How does llm safety training fail? , author=. Advances in Neural Information Processing Systems , volume=
[77]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. arXiv preprint arXiv:2311.05232 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

arXiv preprint arXiv:2402.15302 , year=

How (un) ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries , author=. arXiv preprint arXiv:2402.15302 , year=

work page arXiv
[79]

Collective Intelligence , volume=

Collective intelligence for deep learning: A survey of recent developments , author=. Collective Intelligence , volume=. 2022 , publisher=

2022
[80]

2017 , institution=

Cyber Deterrence and Stability , author=. 2017 , institution=

2017

Showing first 80 references.

[1] [1]

Position:

Riley Simmons-Edler and Ryan Paul Badman and Shayne Longpre and Kanaka Rajan , booktitle=. Position:. 2024 , url=

2024

[2] [2]

and Abbott, S

Jenkins, I. and Abbott, S. and Armbruster, M. and Brandt, L. and Conklin, K. and Davies, D. and Etim, D. N. and Gillens, A. R. and Graham, J. J. and Green, C. and Herzog, S. and Powell, N. and Rumbaugh, W. and Salisbury, D. and Sanders-Zakre, A. and Toivanen, H. , title =. Project on Nuclear Issues: A Collection of Papers from the 2017 Conference Series a...

2017

[3] [3]

Journal of Conflict Resolution , volume=

Under the umbrella: Nuclear crises, extended deterrence, and public opinion , author=. Journal of Conflict Resolution , volume=. 2022 , publisher=

2022

[4] [4]

2018 , publisher=

Understanding deterrence , author=. 2018 , publisher=

2018

[5] [5]

2013 , month = jun, day =

Anup Shah , title =. 2013 , month = jun, day =

2013

[6] [6]

Technical workshop on safeguards, verification technologies, and other related experience , number=

Nuclear verification: what it is, how it works, the assurances it can provide , author=. Technical workshop on safeguards, verification technologies, and other related experience , number=

[7] [7]

Strategic stability: contending interpretations , pages=

The origins of strategic stability: the United States and the threat of surprise attack , author=. Strategic stability: contending interpretations , pages=. 2013 , publisher=

2013

[8] [8]

arXiv preprint arXiv:2603.01608 , year=

Evaluating and understanding scheming propensity in LLM agents , author=. arXiv preprint arXiv:2603.01608 , year=

work page arXiv

[9] [9]

scheming

Lessons from a chimp: Ai" scheming" and the quest for ape language , author=. arXiv preprint arXiv:2507.03409 , year=

work page arXiv

[10] [10]

Bogdan and Emmanuel Ameisen and James Chen and Dzmitry Kishylau and Adam Pearce and Julius Tarng and Alex Wu and Jeff Wu and Yang Zhang and Daniel M

Kit Fraser‑Taliente and Subhash Kantamneni and Euan Ong and Dan Mossing and Christina Lu and Paul C. Bogdan and Emmanuel Ameisen and James Chen and Dzmitry Kishylau and Adam Pearce and Julius Tarng and Alex Wu and Jeff Wu and Yang Zhang and Daniel M. Ziegler and Evan Hubinger and Joshua Batson and Jack Lindsey and Samuel Zimmerman and Samuel Marks , title...

[11] [11]

Dumbacher , title =

Erin D. Dumbacher , title =. 2026 , month = feb, day =

2026

[12] [12]

2026 , month = mar, day =

Michael Albertson , title =. 2026 , month = mar, day =

2026

[13] [13]

and Allen, Keir and Benz, Jacob M

White, Helen and Tanner, Jennifer E. and Allen, Keir and Benz, Jacob M. and McOmish, Sarah and Simmons, Kevin L. , title =. 2012 , month =

2012

[14] [14]

, author=

Remote Monitoring Systems/Remote Data Transmission for International Nuclear Safeguards. , author=. 2022 , institution=

2022

[15] [15]

2016 , note =

Vincent Fournier and IAEA Office of Public Information and Communication , title =. 2016 , note =

2016

[16] [16]

International Conference on Learning Representations , volume=

Tamper-resistant safeguards for open-weight llms , author=. International Conference on Learning Representations , volume=

[17] [17]

Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications , url =

Simmons-Edler, Riley and Dong, Jean and Lushenko, Paul and Rajan, Kanaka and Badman, Ryan , booktitle =. Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications , url =

[18] [18]

2026 , month = may, url =

New START Treaty , author =. 2026 , month = may, url =

2026

[19] [19]

2026 , journal =

Mishra, Vibhu , title =. 2026 , journal =

2026

[20] [20]

2026 , month = apr, url =

New START at a Glance , author =. 2026 , month = apr, url =

2026

[21] [21]

CJADC2 Initiative , howpublished =

[22] [22]

2025 , howpublished =

Defense Command and Control: Further Progress Hinges on Establishing a Comprehensive Framework , institution =. 2025 , howpublished =

2025

[23] [23]

Multi-Domain Operations , howpublished =

[24] [24]

Solving the Hidden Challenges of JADC2 , howpublished =

[25] [25]

Joint All-Domain Command and Control (JADC2) Capabilities , howpublished =

[26] [26]

Essential Guide to JADC2 , howpublished =

[27] [27]

2022 , url =

Summary of the Joint–All Domain Command and Control (JADC2) Strategy , institution =. 2022 , url =

2022

[28] [28]

Risk and Regulation of Artificial Intelligence in Nuclear Command , year =

Paul Dean and Chris Meserole and Helen Toner , url =. Risk and Regulation of Artificial Intelligence in Nuclear Command , year =

[29] [29]

ACM computing surveys (CSUR) , volume=

A survey on bias and fairness in machine learning , author=. ACM computing surveys (CSUR) , volume=. 2021 , publisher=

2021

[30] [30]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Open problems and fundamental limitations of reinforcement learning from human feedback , author=. arXiv preprint arXiv:2307.15217 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

AI Alignment: A Comprehensive Survey

Ai alignment: A comprehensive survey , author=. arXiv preprint arXiv:2310.19852 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

2022 , publisher=

Dataset shift in machine learning , author=. 2022 , publisher=

2022

[33] [33]

Review of International Studies , pages=

Revisiting the ‘stability--instability paradox’in AI-enabled warfare: A modern-day Promethean tragedy under the nuclear shadow? , author=. Review of International Studies , pages=. 2024 , publisher=

2024

[34] [34]

2021 , url =

Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union legislative acts , author =. 2021 , url =

2021

[35] [35]

2023 , url =

Executive Order (E.O.) 14110 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence , author =. 2023 , url =

2023

[36] [36]

2023 , url =

China’s AI Regulations and How They Get Made , author =. 2023 , url =

2023

[37] [37]

2023 , translator =

Measures for the Management of Generative Artificial Intelligence Services (Translated) , author =. 2023 , translator =

2023

[38] [38]

arXiv preprint arXiv:2307.04699 , year=

International institutions for advanced AI , author=. arXiv preprint arXiv:2307.04699 , year=

work page arXiv

[39] [39]

Aligning

Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt , booktitle=. Aligning. 2021 , url=

2021

[40] [40]

ACM Computing Surveys (CSUR) , volume=

A review on fairness in machine learning , author=. ACM Computing Surveys (CSUR) , volume=. 2022 , publisher=

2022

[41] [41]

ACM Computing Surveys (CSUR) , volume=

Adversarial machine learning attacks and defense methods in the cyber security domain , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

2021

[42] [42]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

2024 , url =

Anthropic , title =. 2024 , url =

2024

[45] [45]

2024 , url =

Kylie Robison , title =. 2024 , url =

2024

[46] [46]

Advances in neural information processing systems , volume=

Deep reinforcement learning from human preferences , author=. Advances in neural information processing systems , volume=

[47] [47]

arXiv preprint arXiv:2312.14925 , year=

A survey of reinforcement learning from human feedback , author=. arXiv preprint arXiv:2312.14925 , year=

work page arXiv

[48] [48]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[49] [49]

NeurIPS 2022 Competition Track , pages=

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition , author=. NeurIPS 2022 Competition Track , pages=. 2023 , organization=

2022

[50] [50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[51] [51]

2023 , url =

Paul Christiano , title =. 2023 , url =

2023

[52] [52]

Supervising strong learners by amplifying weak experts

Supervising strong learners by amplifying weak experts , author=. arXiv preprint arXiv:1810.08575 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[53] [53]

AI safety via debate

AI safety via debate , author=. arXiv preprint arXiv:1805.00899 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

Advances in neural information processing systems , volume=

Cooperative inverse reinforcement learning , author=. Advances in neural information processing systems , volume=

[55] [55]

Forty-first International Conference on Machine Learning , year=

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision , author=. Forty-first International Conference on Machine Learning , year=

[56] [56]

arXiv preprint arXiv:2503.05628 , year=

Superintelligence strategy: Expert version , author=. arXiv preprint arXiv:2503.05628 , year=

work page arXiv

[57] [57]

Scalable agent alignment via reward modeling: a research direction

Scalable agent alignment via reward modeling: a research direction , author=. arXiv preprint arXiv:1811.07871 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[58] [58]

2023 , journal =

Kwan Yee Ng and Jason Zhou and Ben Murphy and Rogier Creemers and Hunter Dorwart , title =. 2023 , journal =

2023

[59] [59]

Artificial Intelligence Leadership and Protect U.S

Anduril Partners with OpenAI to Advance U.S. Artificial Intelligence Leadership and Protect U.S. and Allied Forces , author =. 2024 , url =

2024

[60] [60]

2023 , url =

What is Arms Control? , author =. 2023 , url =

2023

[61] [61]

Journal of International Humanitarian Legal Studies , volume=

Innovation-proof global governance for military artificial intelligence?: How I learned to stop worrying, and love the bot , author=. Journal of International Humanitarian Legal Studies , volume=. 2019 , publisher=

2019

[62] [62]

Contemporary Security Policy , volume=

How viable is international arms control for military artificial intelligence? Three lessons from nuclear weapons , author=. Contemporary Security Policy , volume=. 2019 , publisher=

2019

[63] [63]

Nature , volume=

AI weapons: Russia’s war in Ukraine shows why the world must enact a ban , author=. Nature , volume=. 2023 , publisher=

2023

[64] [64]

2024 , note =

Memorandum on Advancing the United States Leadership in Artificial Intelligence, Harnessing Artificial Intelligence to Fulfill National Security Objectives, and Fostering the Safety and Security , howpublished =. 2024 , note =

2024

[65] [65]

International organization , volume=

The emergence of cooperation: national epistemic communities and the international evolution of the idea of nuclear arms control , author=. International organization , volume=. 1992 , publisher=

1992

[66] [66]

2024 , booktitle=

Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback , author=. 2024 , booktitle=

2024

[67] [67]

Review of international studies , volume=

Rethinking epistemic communities twenty years later , author=. Review of international studies , volume=. 2013 , publisher=

2013

[68] [68]

Nature human behaviour , pages=

Large language models surpass human experts in predicting neuroscience results , author=. Nature human behaviour , pages=. 2024 , publisher=

2024

[69] [69]

Exploring collaboration mechanisms for llm agents: A social psychology view,

Exploring collaboration mechanisms for llm agents: A social psychology view , author=. arXiv preprint arXiv:2310.02124 , year=

work page arXiv

[70] [70]

Proceedings of the ACM on Human-Computer Interaction , volume=

Human-ai collaboration in cooperative games: A study of playing codenames with an llm assistant , author=. Proceedings of the ACM on Human-Computer Interaction , volume=. 2024 , publisher=

2024

[71] [71]

IEEE Spectrum , volume=

False alarm, nuclear danger , author=. IEEE Spectrum , volume=. 2000 , publisher=

2000

[72] [72]

European Journal of International Security , volume=

Inadvertent escalation in the age of intelligence machines: A new model for nuclear risk in the digital age , author=. European Journal of International Security , volume=. 2022 , publisher=

2022

[73] [73]

2024 , booktitle=

Stealing part of a production language model , author=. 2024 , booktitle=

2024

[74] [74]

arXiv preprint arXiv:2005.05909 , year=

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp , author=. arXiv preprint arXiv:2005.05909 , year=

work page arXiv 2005

[75] [75]

arXiv preprint arXiv:2104.13733 , year=

Gradient-based adversarial attacks against text transformers , author=. arXiv preprint arXiv:2104.13733 , year=

work page arXiv

[76] [76]

Advances in Neural Information Processing Systems , volume=

Jailbroken: How does llm safety training fail? , author=. Advances in Neural Information Processing Systems , volume=

[77] [77]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. arXiv preprint arXiv:2311.05232 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

arXiv preprint arXiv:2402.15302 , year=

How (un) ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries , author=. arXiv preprint arXiv:2402.15302 , year=

work page arXiv

[79] [79]

Collective Intelligence , volume=

Collective intelligence for deep learning: A survey of recent developments , author=. Collective Intelligence , volume=. 2022 , publisher=

2022

[80] [80]

2017 , institution=

Cyber Deterrence and Stability , author=. 2017 , institution=

2017