pith. sign in
Pith Number

pith:M3OIQ5MS

pith:2025:M3OIQ5MS6SYSDRRIORQJRRTGHL
not attested not anchored not stored refs pending

Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards

Aaron Tu, Amin Saberi, Bing Hu, Fang Wu, Ge Liu, Hanqun Cao, Heli Qi, Huaxiu Yao, Jure Leskovec, Li Erran Li, Nan Liu, Naoto Yokoya, Peng Xia, Qingcheng Zeng, Rui Yang, Shayan Talaei, Weihao Xuan, Wenqi Shi, Xiangru Tang, Xu Huang, Yejin Choi, Yijia Xiao, Yinxi Li, Yuchen Zhuang

Many reported RLVR gains on math and code tasks shrink or vanish once budgets, prompts, and contamination are controlled.

arxiv:2509.21882 v3 · 2025-09-26 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{M3OIQ5MS6SYSDRRIORQJRRTGHL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Several widely cited gaps shrink substantially or disappear once budgets, prompts, and dataset versions are matched, and contaminated sets are treated as memorization probes rather than evidence of reasoning.

C2weakest assumption

That the budget-matched reproductions and partial-prompt contamination probes are representative of the headline results in the broader RLVR literature and that the three listed confounds are the dominant sources of overstated gains.

C3one line summary

The paper identifies confounds in RLVR evaluations that inflate apparent gains and proposes a minimum standard for budget-matched, contamination-aware assessment with calibration tracking.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-27T01:04:51.234020Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

66dc887592f4b121c628746098c6663af6225570419face1f7fef42442a90a32

Aliases

arxiv: 2509.21882 · arxiv_version: 2509.21882v3 · doi: 10.48550/arxiv.2509.21882 · pith_short_12: M3OIQ5MS6SYS · pith_short_16: M3OIQ5MS6SYSDRRI · pith_short_8: M3OIQ5MS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/M3OIQ5MS6SYSDRRIORQJRRTGHL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 66dc887592f4b121c628746098c6663af6225570419face1f7fef42442a90a32
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9ec84cad11b0802376973a89f9ce57bb8bc16d2e2018971b8964bbd670ed54fb",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-09-26T05:06:25Z",
    "title_canon_sha256": "6f36e13162ef68d97956f49b23689294f4f70281f9e34ddbb08ddb6a4023b3cc"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.21882",
    "kind": "arxiv",
    "version": 3
  }
}