pith. sign in
Pith Number

pith:EB4UBYJK

pith:2026:EB4UBYJKQKXEBO46QRJZBLQZZS
not attested not anchored not stored refs pending

Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking

Disha Singha

A dual-source uncertainty framework using ensemble disagreement and preference variability reduces reward hacking by 93.7 percent in RL.

arxiv:2604.26360 v2 · 2026-04-29 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EB4UBYJKQKXEBO46QRJZBLQZZS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Empirical results across multiple discrete grid configurations (6x6, 8x8, 10x10) and high-dimensional continuous control environments (Hopper-v4, Walker2d-v4) demonstrate that our approach yields more stable training dynamics and reduces exploitative behaviors under reward ambiguity, achieving a 93.7% reduction in reward-hacking behavior as measured by trap visitation frequency.

C2weakest assumption

That ensemble disagreement reliably captures epistemic uncertainty relevant to reward hacking and that variability in reward annotations accurately reflects true preference uncertainty, allowing the Reliability Filter to correctly balance exploitation and caution without discarding useful actions.

C3one line summary

Uncertainty-aware RL framework using ensemble disagreement and annotation variability reduces reward-hacking trap visits by 93.7% across grid and continuous control tasks while remaining robust to 30% label noise.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-06-29T01:15:04.950810Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

207940e12a82ae40bb9e845390ae19ccb98fcd7b4f4f44227f4f6edadb820e8d

Aliases

arxiv: 2604.26360 · arxiv_version: 2604.26360v2 · doi: 10.48550/arxiv.2604.26360 · pith_short_12: EB4UBYJKQKXE · pith_short_16: EB4UBYJKQKXEBO46 · pith_short_8: EB4UBYJK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EB4UBYJKQKXEBO46QRJZBLQZZS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 207940e12a82ae40bb9e845390ae19ccb98fcd7b4f4f44227f4f6edadb820e8d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0e3c4dcbd297b3a42b6f45a3645e29c769b88a9064c3d95c1352c9de41f598aa",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-04-29T07:14:01Z",
    "title_canon_sha256": "5fd2a2d31771e80dc1926b6fa7751c563d26d1025212e995ac87f0ce2b734de5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.26360",
    "kind": "arxiv",
    "version": 2
  }
}