pith:EB4UBYJK
Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking
A dual-source uncertainty framework using ensemble disagreement and preference variability reduces reward hacking by 93.7 percent in RL.
arxiv:2604.26360 v2 · 2026-04-29 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EB4UBYJKQKXEBO46QRJZBLQZZS}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Empirical results across multiple discrete grid configurations (6x6, 8x8, 10x10) and high-dimensional continuous control environments (Hopper-v4, Walker2d-v4) demonstrate that our approach yields more stable training dynamics and reduces exploitative behaviors under reward ambiguity, achieving a 93.7% reduction in reward-hacking behavior as measured by trap visitation frequency.
That ensemble disagreement reliably captures epistemic uncertainty relevant to reward hacking and that variability in reward annotations accurately reflects true preference uncertainty, allowing the Reliability Filter to correctly balance exploitation and caution without discarding useful actions.
Uncertainty-aware RL framework using ensemble disagreement and annotation variability reduces reward-hacking trap visits by 93.7% across grid and continuous control tasks while remaining robust to 30% label noise.
Cited by
Receipt and verification
| First computed | 2026-06-29T01:15:04.950810Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
207940e12a82ae40bb9e845390ae19ccb98fcd7b4f4f44227f4f6edadb820e8d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EB4UBYJKQKXEBO46QRJZBLQZZS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 207940e12a82ae40bb9e845390ae19ccb98fcd7b4f4f44227f4f6edadb820e8d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "0e3c4dcbd297b3a42b6f45a3645e29c769b88a9064c3d95c1352c9de41f598aa",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-04-29T07:14:01Z",
"title_canon_sha256": "5fd2a2d31771e80dc1926b6fa7751c563d26d1025212e995ac87f0ce2b734de5"
},
"schema_version": "1.0",
"source": {
"id": "2604.26360",
"kind": "arxiv",
"version": 2
}
}