pith. sign in

Magnus J{\o}rgenv{\aa}g

Identifiers

  • name variant Magnus J{\o}rgenv{\aa}g 0.60 · backfill

Papers (2)

  1. Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards cs.CL · 2026 · author #1
  2. In-Training Defenses against Emergent Misalignment in Language Models cs.LG · 2025 · author #2

Mentions

  • 2508.06249 #2 · arxiv_oai · confidence 0.70 Magnus J{\o}rgenv{\aa}g
  • 2605.31328 #1 · arxiv_oai · confidence 0.70 Magnus J{\o}rgenv{\aa}g

Frequent Coauthors