Oscar Obeso
Identifiers
No identifiers captured yet.
Papers (1)
- Refusal in Language Models Is Mediated by a Single Direction cs.LG · 2024 · author #2
Mentions
No mention provenance yet.
Frequent Coauthors
- Aaquib Syed 1 shared papers
- Andy Arditi 1 shared papers
- Daniel Paleka 1 shared papers
- Neel Nanda 1 shared papers
- Nina Panickssery 1 shared papers
- Wes Gurnee 1 shared papers