Nicholas Schiefer
Identifiers
- name variant Nicholas Schiefer 0.60 · backfill
Papers (12)
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #7
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #38
- Towards Understanding Sycophancy in Language Models cs.CL · 2023 · author #16
- Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #15
- Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023 · author #4
- Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #62
- Constitutional AI: Harmlessness from AI Feedback cs.CL · 2022 · author #29
- Measuring Progress on Scalable Oversight for Large Language Models cs.HC · 2022 · author #28
- Toy Models of Superposition cs.LG · 2022 · author #4
- Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022 · author #9
- Language Models (Mostly) Know What They Know cs.CL · 2022 · author #7
- FoundationDB Record Layer: A Multi-Tenant Structured Datastore cs.DB · 2019 · author #12
Mentions
- 2211.03540 #28 · arxiv_oai · confidence 0.70 Nicholas Schiefer
- 2406.10162 #7 · arxiv_oai · confidence 0.70 Nicholas Schiefer
- 2306.16388 #4 · arxiv_oai · confidence 0.70 Nicholas Schiefer
Frequent Coauthors
- Jared Kaplan 10 shared papers
- Ethan Perez 9 shared papers
- Sam McCandlish 9 shared papers
- Shauna Kravec 9 shared papers
- Zac Hatfield-Dodds 9 shared papers
- Amanda Askell 8 shared papers
- Kamal Ndousse 7 shared papers
- Nicholas Joseph 7 shared papers
- Samuel R. Bowman 7 shared papers
- Tristan Hume 7 shared papers
- Anna Chen 6 shared papers
- Danny Hernandez 6 shared papers
- Dario Amodei 6 shared papers
- Dawn Drain 6 shared papers
- Deep Ganguli 6 shared papers
- Jackson Kernion 6 shared papers
- Liane Lovitt 6 shared papers
- Nelson Elhage 6 shared papers
- Nova DasSarma 6 shared papers
- Tom Henighan 6 shared papers