Amanda Askell — Pith Author Registry

Identifiers

name variant Amanda Askell 0.60 · backfill

Papers (18)

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming cs.CL · 2025 · author #11
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #12
Towards Understanding Sycophancy in Language Models cs.CL · 2023 · author #5
Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023 · author #5
Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #57
Constitutional AI: Harmlessness from AI Feedback cs.CL · 2022 · author #4
Measuring Progress on Scalable Oversight for Large Language Models cs.HC · 2022 · author #8
In-context Learning and Induction Heads cs.LG · 2022 · author #8
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022 · author #4
Language Models (Mostly) Know What They Know cs.CL · 2022 · author #3
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models cs.CL · 2022 · author #24
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback cs.CL · 2022 · author #4
Training language models to follow instructions with human feedback cs.CL · 2022 · author #16
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021 · author #1
Learning Transferable Visual Models From Natural Language Supervision cs.CV · 2021 · author #8
Language Models are Few-Shot Learners cs.CL · 2020 · author #10
Release Strategies and the Social Impacts of Language Models cs.CL · 2019 · author #4
The Role of Cooperation in Responsible AI Development cs.CY · 2019 · author #1

Mentions

2501.18837 #11 · arxiv_oai · confidence 0.70 Amanda Askell
2211.03540 #8 · arxiv_oai · confidence 0.70 Amanda Askell
1908.09203 #4 · arxiv_oai · confidence 0.70 Amanda Askell
2306.16388 #5 · arxiv_oai · confidence 0.70 Amanda Askell

Frequent Coauthors

Jared Kaplan 13 shared papers
Jack Clark 11 shared papers
Sam McCandlish 11 shared papers
Deep Ganguli 10 shared papers
Kamal Ndousse 10 shared papers
Yuntao Bai 10 shared papers
Zac Hatfield-Dodds 10 shared papers
Danny Hernandez 9 shared papers
Dario Amodei 9 shared papers
Jackson Kernion 9 shared papers
Nicholas Joseph 9 shared papers
Nova DasSarma 9 shared papers
Tom Henighan 9 shared papers
Andy Jones 8 shared papers
Anna Chen 8 shared papers
Ben Mann 8 shared papers
Catherine Olsson 8 shared papers
Dawn Drain 8 shared papers
Ethan Perez 8 shared papers
Liane Lovitt 8 shared papers