Managing extreme AI risks amid rapid progress

Anca Dragan; Andrew Yao; Ashwin Acharya; At{\i}l{\i}m G\"une\c{s} Baydin; Daniel Kahneman; David Krueger; Dawn Song; Frank Hutter; Geoffrey Hinton; Gillian Hadfield

arxiv: 2310.17688 · v3 · pith:DOMK3IPQnew · submitted 2023-10-26 · 💻 cs.CY · cs.AI· cs.CL· cs.LG

Managing extreme AI risks amid rapid progress

Yoshua Bengio , Geoffrey Hinton , Andrew Yao , Dawn Song , Pieter Abbeel , Trevor Darrell , Yuval Noah Harari , Ya-Qin Zhang

show 17 more authors

Lan Xue Shai Shalev-Shwartz Gillian Hadfield Jeff Clune Tegan Maharaj Frank Hutter At{\i}l{\i}m G\"une\c{s} Baydin Sheila McIlraith Qiqi Gao Ashwin Acharya David Krueger Anca Dragan Philip Torr Stuart Russell Daniel Kahneman Jan Brauner S\"oren Mindermann

This is my paper

classification 💻 cs.CY cs.AIcs.CLcs.LG

keywords riskssystemsextremeautonomousconsensusgovernancelackmechanisms

0 comments

read the original abstract

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts
cs.CL 2025-10 unverdicted novelty 6.0

Red-Bandit adapts online to LLM failure modes by dynamically selecting among RL-trained LoRA attack-style experts via a bandit policy, reporting SOTA ASR@10 on AdvBench with lower-perplexity prompts.