pith. sign in

Carson Denison

Identifiers

  • name variant Carson Denison 0.60 · backfill

Papers (5)

  1. Reasoning Models Don't Always Say What They Think cs.CL · 2025 · author #5
  2. Alignment faking in large language models cs.AI · 2024 · author #2
  3. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #1
  4. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #2
  5. Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023 · author #5

Mentions

  • 2406.10162 #1 · arxiv_oai · confidence 0.70 Carson Denison

Frequent Coauthors