pith:M3OIQ5MS
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
Many reported RLVR gains on math and code tasks shrink or vanish once budgets, prompts, and contamination are controlled.
arxiv:2509.21882 v3 · 2025-09-26 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{M3OIQ5MS6SYSDRRIORQJRRTGHL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Several widely cited gaps shrink substantially or disappear once budgets, prompts, and dataset versions are matched, and contaminated sets are treated as memorization probes rather than evidence of reasoning.
That the budget-matched reproductions and partial-prompt contamination probes are representative of the headline results in the broader RLVR literature and that the three listed confounds are the dominant sources of overstated gains.
The paper identifies confounds in RLVR evaluations that inflate apparent gains and proposes a minimum standard for budget-matched, contamination-aware assessment with calibration tracking.
Cited by
Receipt and verification
| First computed | 2026-05-27T01:04:51.234020Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
66dc887592f4b121c628746098c6663af6225570419face1f7fef42442a90a32
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/M3OIQ5MS6SYSDRRIORQJRRTGHL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 66dc887592f4b121c628746098c6663af6225570419face1f7fef42442a90a32
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "9ec84cad11b0802376973a89f9ce57bb8bc16d2e2018971b8964bbd670ed54fb",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-09-26T05:06:25Z",
"title_canon_sha256": "6f36e13162ef68d97956f49b23689294f4f70281f9e34ddbb08ddb6a4023b3cc"
},
"schema_version": "1.0",
"source": {
"id": "2509.21882",
"kind": "arxiv",
"version": 3
}
}