pith. sign in

Michael Sellitto

Identifiers

No identifiers captured yet.

Papers (3)

  1. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #21
  2. Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #35
  3. Constitutional AI: Harmlessness from AI Feedback cs.CL · 2022 · author #27

Mentions

No mention provenance yet.

Frequent Coauthors