pith. sign in

Steven Basart

Identifiers

  • name variant Steven Basart 0.60 · backfill

Papers (7)

  1. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #46
  2. HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024 · author #9
  3. Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #16
  4. Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #2
  5. Measuring Mathematical Problem Solving With the MATH Dataset cs.LG · 2021 · author #5
  6. Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #3
  7. Aligning AI With Shared Human Values cs.CY · 2020 · author #3

Mentions

  • 2008.02275 #3 · arxiv_oai · confidence 0.70 Steven Basart
  • 2403.03218 #46 · arxiv_oai · confidence 0.70 Steven Basart

Frequent Coauthors