pith. sign in

Nicholas Schiefer

Identifiers

  • name variant Nicholas Schiefer 0.60 · backfill

Papers (12)

  1. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #7
  2. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #38
  3. Towards Understanding Sycophancy in Language Models cs.CL · 2023 · author #16
  4. Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #15
  5. Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023 · author #4
  6. Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #62
  7. Constitutional AI: Harmlessness from AI Feedback cs.CL · 2022 · author #29
  8. Measuring Progress on Scalable Oversight for Large Language Models cs.HC · 2022 · author #28
  9. Toy Models of Superposition cs.LG · 2022 · author #4
  10. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022 · author #9
  11. Language Models (Mostly) Know What They Know cs.CL · 2022 · author #7
  12. FoundationDB Record Layer: A Multi-Tenant Structured Datastore cs.DB · 2019 · author #12

Mentions

  • 2211.03540 #28 · arxiv_oai · confidence 0.70 Nicholas Schiefer
  • 2406.10162 #7 · arxiv_oai · confidence 0.70 Nicholas Schiefer
  • 2306.16388 #4 · arxiv_oai · confidence 0.70 Nicholas Schiefer

Frequent Coauthors