pith. sign in

Alexander Pan

Identifiers

  • name variant Alexander Pan 0.60 · backfill

Papers (4)

  1. Reducing Political Manipulation with Consistency Training cs.CL · 2026 · author #3
  2. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #2
  3. Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #7
  4. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models cs.LG · 2022 · author #1

Mentions

  • 2605.22771 #3 · arxiv_oai · confidence 0.70 Alexander Pan
  • 2201.03544 #1 · arxiv_oai · confidence 0.70 Alexander Pan
  • 2403.03218 #2 · arxiv_oai · confidence 0.70 Alexander Pan

Frequent Coauthors