pith. sign in

Evan Hubinger

Identifiers

  • name variant Evan Hubinger 0.60 · backfill

Papers (8)

  1. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety cs.AI · 2025 · author #17
  2. Alignment faking in large language models cs.AI · 2024 · author #20
  3. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #14
  4. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #1
  5. Steering Llama 2 via Contrastive Activation Addition cs.CL · 2023 · author #5
  6. Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #9
  7. Discovering Language Model Behaviors with Model-Written Evaluations cs.CL · 2022 · author #61
  8. Risks from Learned Optimization in Advanced Machine Learning Systems cs.AI · 2019 · author #1

Mentions

  • 2507.11473 #17 · arxiv_oai · confidence 0.70 Evan Hubinger
  • 2406.10162 #14 · arxiv_oai · confidence 0.70 Evan Hubinger

Frequent Coauthors