Pith Number

pith:IPGNRXOG

pith:2025:IPGNRXOGQTIZKJIQVGLRWUR6R4

not attested not anchored not stored refs resolved

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Beichen Zhang, Bowen Yu, Chujie Zheng, Dayiheng Liu, Jingren Zhou, Junyang Lin, Runji Lin, Yangzhen Wu, Zhenru Zhang

Consensus filtering across annotation methods yields stronger process reward models for mathematical reasoning by correcting biases in standard evaluations.

arxiv:2501.07301 v2 · 2025-01-13 · cs.CL · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{IPGNRXOGQTIZKJIQVGLRWUR6R4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives

C2weakest assumption

That the observed biases in Best-of-N evaluation and the superiority of consensus filtering generalize beyond the specific models, datasets, and tasks tested in the experiments.

C3one line summary

Monte Carlo data synthesis for PRMs underperforms LLM-judge and human methods, Best-of-N evaluations suffer from process-outcome misalignment and score inflation, and consensus filtering yields better PRMs with higher data efficiency.

References

19 extracted · 19 resolved · 8 Pith anchors

[1] Alphamath almost zero: Process supervision without process

[2] The Llama 3 Herd of Models · arXiv:2407.21783

[3] Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback

[4] Measuring Mathematical Problem Solving With the MATH Dataset · arXiv:2103.03874

[5] Ra- masesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra 2022

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

Supervising the search process produces reliable and generalizable information-seeking agents

A Survey of Scaling in Large Language Model Reasoning

MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

Receipt and verification

First computed	2026-05-17T23:38:47.712978Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7

Aliases

arxiv: 2501.07301 · arxiv_version: 2501.07301v2 · doi: 10.48550/arxiv.2501.07301 · pith_short_12: IPGNRXOGQTIZ · pith_short_16: IPGNRXOGQTIZKJIQ · pith_short_8: IPGNRXOG

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/IPGNRXOGQTIZKJIQVGLRWUR6R4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 43ccd8ddc684d1952510a9971b523e8f08a353fb05094a66dfef2a526f46bfb7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "c347e167b1e6e525c0aa0967effdee336a462359514005faddfc12323f8ee860",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-01-13T13:10:16Z",
    "title_canon_sha256": "d0d836b11be729d0489a5905659f20cb2d80a8e72807e76529642c710c26f9f0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.07301",
    "kind": "arxiv",
    "version": 2
  }
}