Monte MacDiarmid
Identifiers
No identifiers captured yet.
Papers (3)
- Alignment faking in large language models cs.AI · 2024 · author #5
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #6
- Steering Language Models With Activation Engineering cs.CL · 2023 · author #7
Mentions
No mention provenance yet.
Frequent Coauthors
- Buck Shlegeris 2 shared papers
- Carson Denison 2 shared papers
- David Duvenaud 2 shared papers
- Ethan Perez 2 shared papers
- Evan Hubinger 2 shared papers
- Jared Kaplan 2 shared papers
- Ryan Greenblatt 2 shared papers
- Samuel R. Bowman 2 shared papers
- S\"oren Mindermann 2 shared papers
- Adam Jermyn 1 shared papers
- Akbir Khan 1 shared papers
- Alexander Matt Turner 1 shared papers
- Amanda Askell 1 shared papers
- Ansh Radhakrishnan 1 shared papers
- Benjamin Wright 1 shared papers
- Cem Anil 1 shared papers
- Daniel M. Ziegler 1 shared papers
- David Udell 1 shared papers
- Deep Ganguli 1 shared papers
- Fabien Roger 1 shared papers