pith:PIJPU4DG
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
Adjusting reference-model temperature generalizes inference-time alignment to ensembles of reward models as a sharpened logarithmic opinion pool whose weights can be calibrated to reduce reward hacking.
arxiv:2605.13537 v1 · 2026-05-13 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PIJPU4DGAG6HKFENONK3GHUOQG}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we propose an algorithm for calibrating SLOP weight parameters and experimentally demonstrate that it improves robustness while preserving alignment performance.
That the proposed calibration algorithm for SLOP weights generalizes beyond the specific experimental setups and that the temperature adjustment reliably extends the theoretical approximations to ensembles without introducing new instabilities.
Temperature adjustment on the reference model generalizes inference-time alignment to SLOP ensembles of reward models, with a calibration algorithm that improves robustness to reward hacking while preserving alignment performance.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T02:44:24.102420Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7a12fa706601bc75148d7355b31e8e81bbdea3e86a96a251d0c1c095d063e000
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PIJPU4DGAG6HKFENONK3GHUOQG \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7a12fa706601bc75148d7355b31e8e81bbdea3e86a96a251d0c1c095d063e000
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2481e8d665fc0f3b97f22844448dad17070e6ff5a7fbcff3521f9ab6946ed229",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T13:47:06Z",
"title_canon_sha256": "15301f46178a0c9abbd2cf925adeec0b22941843232ebea12d1371a06a6438c2"
},
"schema_version": "1.0",
"source": {
"id": "2605.13537",
"kind": "arxiv",
"version": 1
}
}