Evan Hubinger
Identifiers
- name variant Evan Hubinger 0.60 · backfill
Papers (8)
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety cs.AI · 2025 · author #17
- Alignment faking in large language models cs.AI · 2024 · author #20
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #14
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #1
- Steering Llama 2 via Contrastive Activation Addition cs.CL · 2023 · author #5
- Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #9
- Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #61
- Risks from Learned Optimization in Advanced Machine Learning Systems cs.AI · 2019 · author #1
Mentions
- 2507.11473 #17 · arxiv_oai · confidence 0.70 Evan Hubinger
- 2406.10162 #14 · arxiv_oai · confidence 0.70 Evan Hubinger
Frequent Coauthors
- Ethan Perez 6 shared papers
- Jared Kaplan 5 shared papers
- Samuel R. Bowman 5 shared papers
- Buck Shlegeris 4 shared papers
- Carson Denison 4 shared papers
- Nicholas Schiefer 4 shared papers
- David Duvenaud 3 shared papers
- Monte MacDiarmid 3 shared papers
- Ryan Greenblatt 3 shared papers
- Shauna Kravec 3 shared papers
- Tamera Lanham 3 shared papers
- Amanda Askell 2 shared papers
- Anna Chen 2 shared papers
- Ansh Radhakrishnan 2 shared papers
- Danny Hernandez 2 shared papers
- Deep Ganguli 2 shared papers
- Dustin Li 2 shared papers
- Fabien Roger 2 shared papers
- Fazl Barez 2 shared papers
- Jack Clark 2 shared papers