arXiv preprint arXiv:2502.14361

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning · arXiv 2502.14361

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.

citing papers explorer

Showing 1 of 1 citing paper.

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning cs.CL · 2025-07-21 · unverdicted · none · ref 28
CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.

arXiv preprint arXiv:2502.14361

fields

years

verdicts

representative citing papers

citing papers explorer