pith. sign in

Dawn Drain

Identifiers

  • name variant Dawn Drain 0.60 · backfill

Papers (12)

  1. Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #21
  2. Constitutional AI: Harmlessness from AI Feedback cs.CL · 2022 · author #15
  3. Measuring Progress on Scalable Oversight for Large Language Models cs.HC · 2022 · author #17
  4. In-context Learning and Induction Heads cs.LG · 2022 · author #12
  5. Toy Models of Superposition cs.LG · 2022 · author #9
  6. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022 · author #16
  7. Language Models (Mostly) Know What They Know cs.CL · 2022 · author #5
  8. Scaling Laws and Interpretability of Learning from Repeated Data cs.LG · 2022 · author #5
  9. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback cs.CL · 2022 · author #7
  10. A General Language Assistant as a Laboratory for Alignment cs.CL · 2021 · author #4
  11. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation cs.SE · 2021 · author #8
  12. GraphCodeBERT: Pre-training Code Representations with Data Flow cs.SE · 2020 · author #14

Mentions

  • 2205.10487 #5 · arxiv_oai · confidence 0.70 Dawn Drain
  • 2211.03540 #17 · arxiv_oai · confidence 0.70 Dawn Drain

Frequent Coauthors