Buck Shlegeris
Identifiers
- name variant Buck Shlegeris 0.60 · backfill
Papers (9)
- Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases cs.AI · 2026 · author #3
- LinuxArena: A Control Setting for AI Agents in Live Production Software Environments cs.CR · 2026 · author #33
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety cs.AI · 2025 · author #34
- Alignment faking in large language models cs.AI · 2024 · author #18
- Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols cs.AI · 2024 · author #3
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #11
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #37
- Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022 · author #4
- Supervising strong learners by amplifying weak experts cs.LG · 2018 · author #2
Mentions
- 2507.11473 #34 · arxiv_oai · confidence 0.70 Buck Shlegeris
- 2406.10162 #11 · arxiv_oai · confidence 0.70 Buck Shlegeris
Frequent Coauthors
- Ethan Perez 4 shared papers
- Evan Hubinger 4 shared papers
- Carson Denison 3 shared papers
- David Duvenaud 3 shared papers
- Jared Kaplan 3 shared papers
- Monte MacDiarmid 3 shared papers
- Ryan Greenblatt 3 shared papers
- Samuel R. Bowman 3 shared papers
- Aryan Bhatt 2 shared papers
- Fabien Roger 2 shared papers
- Fazl Barez 2 shared papers
- Julian Michael 2 shared papers
- Nicholas Schiefer 2 shared papers
- Paul Christiano 2 shared papers
- Shauna Kravec 2 shared papers
- S\"oren Mindermann 2 shared papers
- Adam Hanson 1 shared papers
- Adam Jermyn 1 shared papers
- Akbir Khan 1 shared papers
- Alan Cooney 1 shared papers