pith. machine review for the scientific record. sign in

Monte MacDiarmid

Identifiers

No identifiers captured yet.

Papers (3)

  1. Alignment faking in large language models cs.AI · 2024 · author #5
  2. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #6
  3. Steering Language Models With Activation Engineering cs.CL · 2023 · author #7

Mentions

No mention provenance yet.

Frequent Coauthors