pith. sign in

Nathaniel Li

Identifiers

  • name variant Nathaniel Li 0.60 · backfill

Papers (5)

  1. Code World Model Preparedness Report cs.SE · 2026 · author #6
  2. Humanity's Last Exam cs.LG · 2025 · author #4
  3. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #1
  4. HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024 · author #8
  5. Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #12

Mentions

  • 2403.03218 #1 · arxiv_oai · confidence 0.70 Nathaniel Li

Frequent Coauthors