Buck Shlegeris

Identifiers

name variant Buck Shlegeris 0.60 · backfill

Papers (9)

Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases cs.AI · 2026 · author #3
LinuxArena: A Control Setting for AI Agents in Live Production Software Environments cs.CR · 2026 · author #33
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety cs.AI · 2025 · author #34
Alignment faking in large language models cs.AI · 2024 · author #18
Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols cs.AI · 2024 · author #3
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #11
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #37
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022 · author #4
Supervising strong learners by amplifying weak experts cs.LG · 2018 · author #2

Mentions

2507.11473 #34 · arxiv_oai · confidence 0.70 Buck Shlegeris
2406.10162 #11 · arxiv_oai · confidence 0.70 Buck Shlegeris

Frequent Coauthors

Ethan Perez 4 shared papers
Evan Hubinger 4 shared papers
Carson Denison 3 shared papers
David Duvenaud 3 shared papers
Jared Kaplan 3 shared papers
Monte MacDiarmid 3 shared papers
Ryan Greenblatt 3 shared papers
Samuel R. Bowman 3 shared papers
Aryan Bhatt 2 shared papers
Fabien Roger 2 shared papers
Fazl Barez 2 shared papers
Julian Michael 2 shared papers
Nicholas Schiefer 2 shared papers
Paul Christiano 2 shared papers
Shauna Kravec 2 shared papers
S\"oren Mindermann 2 shared papers
Adam Hanson 1 shared papers
Adam Jermyn 1 shared papers
Akbir Khan 1 shared papers
Alan Cooney 1 shared papers