pith.
Research
Integrity
Review
Pre-print
sign in
Physics
Mathematics
Computer Science
Biology
Finance
Statistics
Systems
Economics
authors
/ Dongyoon Hahm
Dongyoon Hahm
Identifiers
name variant
Dongyoon Hahm
0.60 · backfill
Papers (1)
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
cs.AI · 2026 · author #1
Mentions
2605.27355
#1 · arxiv_oai · confidence 0.70
Dongyoon Hahm
Frequent Coauthors
Dylan Hadfield-Menell
1 shared papers
Kimin Lee
1 shared papers