pith:IPGNRXOG
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Consensus filtering across annotation methods yields stronger process reward models for mathematical reasoning by correcting biases in standard evaluations.
arxiv:2501.07301 v2 · 2025-01-13 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IPGNRXOGQTIZKJIQVGLRWUR6R4}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives
That the observed biases in Best-of-N evaluation and the superiority of consensus filtering generalize beyond the specific models, datasets, and tasks tested in the experiments.
Monte Carlo data synthesis for PRMs underperforms LLM-judge and human methods, Best-of-N evaluations suffer from process-outcome misalignment and score inflation, and consensus filtering yields better PRMs with higher data efficiency.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.712978Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IPGNRXOGQTIZKJIQVGLRWUR6R4 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c347e167b1e6e525c0aa0967effdee336a462359514005faddfc12323f8ee860",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-01-13T13:10:16Z",
"title_canon_sha256": "d0d836b11be729d0489a5905659f20cb2d80a8e72807e76529642c710c26f9f0"
},
"schema_version": "1.0",
"source": {
"id": "2501.07301",
"kind": "arxiv",
"version": 2
}
}