pith. sign in
Pith Number

pith:IPGNRXOG

pith:2025:IPGNRXOGQTIZKJIQVGLRWUR6R4
not attested not anchored not stored refs resolved

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Beichen Zhang, Bowen Yu, Chujie Zheng, Dayiheng Liu, Jingren Zhou, Junyang Lin, Runji Lin, Yangzhen Wu, Zhenru Zhang

Consensus filtering across annotation methods yields stronger process reward models for mathematical reasoning by correcting biases in standard evaluations.

arxiv:2501.07301 v2 · 2025-01-13 · cs.CL · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IPGNRXOGQTIZKJIQVGLRWUR6R4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives

C2weakest assumption

That the observed biases in Best-of-N evaluation and the superiority of consensus filtering generalize beyond the specific models, datasets, and tasks tested in the experiments.

C3one line summary

Monte Carlo data synthesis for PRMs underperforms LLM-judge and human methods, Best-of-N evaluations suffer from process-outcome misalignment and score inflation, and consensus filtering yields better PRMs with higher data efficiency.

References

19 extracted · 19 resolved · 8 Pith anchors

[1] Alphamath almost zero: Process supervision without process
[2] The Llama 3 Herd of Models · arXiv:2407.21783
[3] Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback
[4] Measuring Mathematical Problem Solving With the MATH Dataset · arXiv:2103.03874
[5] Ra- masesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra 2022

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.712978Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7

Aliases

arxiv: 2501.07301 · arxiv_version: 2501.07301v2 · doi: 10.48550/arxiv.2501.07301 · pith_short_12: IPGNRXOGQTIZ · pith_short_16: IPGNRXOGQTIZKJIQ · pith_short_8: IPGNRXOG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IPGNRXOGQTIZKJIQVGLRWUR6R4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c347e167b1e6e525c0aa0967effdee336a462359514005faddfc12323f8ee860",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-01-13T13:10:16Z",
    "title_canon_sha256": "d0d836b11be729d0489a5905659f20cb2d80a8e72807e76529642c710c26f9f0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.07301",
    "kind": "arxiv",
    "version": 2
  }
}