← back to paper
arxiv: 2604.26360 · 2 revisions
Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking