Carson Denison
Identifiers
- name variant Carson Denison 0.60 · backfill
Papers (5)
- Reasoning Models Don't Always Say What They Think cs.CL · 2025 · author #5
- Alignment faking in large language models cs.AI · 2024 · author #2
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #1
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #2
- Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #5
Mentions
- 2406.10162 #1 · arxiv_oai · confidence 0.70 Carson Denison
Frequent Coauthors
- Ethan Perez 5 shared papers
- Jared Kaplan 5 shared papers
- Samuel R. Bowman 5 shared papers
- Evan Hubinger 4 shared papers
- Ansh Radhakrishnan 3 shared papers
- Buck Shlegeris 3 shared papers
- David Duvenaud 3 shared papers
- Monte MacDiarmid 3 shared papers
- Nicholas Schiefer 3 shared papers
- Fabien Roger 2 shared papers
- Fazl Barez 2 shared papers
- Jan Brauner 2 shared papers
- Jonathan Uesato 2 shared papers
- Newton Cheng 2 shared papers
- Ryan Greenblatt 2 shared papers
- Shauna Kravec 2 shared papers
- S\"oren Mindermann 2 shared papers
- Tamera Lanham 2 shared papers
- Adam Jermyn 1 shared papers
- Akbir Khan 1 shared papers