Steven Basart
Identifiers
- name variant Steven Basart 0.60 · backfill
Papers (7)
- The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #46
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024 · author #9
- Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #16
- Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #2
- Measuring Mathematical Problem Solving With the MATH Dataset cs.LG · 2021 · author #5
- Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #3
- Aligning AI With Shared Human Values cs.CY · 2020 · author #3
Mentions
- 2008.02275 #3 · arxiv_oai · confidence 0.70 Steven Basart
- 2403.03218 #46 · arxiv_oai · confidence 0.70 Steven Basart
Frequent Coauthors
- Dan Hendrycks 7 shared papers
- Dawn Song 5 shared papers
- Mantas Mazeika 5 shared papers
- Andy Zou 4 shared papers
- Collin Burns 4 shared papers
- Jacob Steinhardt 4 shared papers
- Long Phan 3 shared papers
- Nathaniel Li 3 shared papers
- Zifan Wang 3 shared papers
- Akul Arora 2 shared papers
- Alexander Pan 2 shared papers
- Ann-Kathrin Dombrowski 2 shared papers
- Saurav Kadavath 2 shared papers
- Shashwat Goel 2 shared papers
- Xuwang Yin 2 shared papers
- Adam A. Hunt 1 shared papers
- Adam Khoja 1 shared papers
- Alexandr Wang 1 shared papers
- Alex Levinson 1 shared papers
- Alex Mallen 1 shared papers