arXiv preprint arXiv:2409.17407

Post-hoc reward calibration: A case study on length bias · 2024 · arXiv 2409.17407

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.

citing papers explorer

Showing 2 of 2 citing papers.

Debiasing Reward Models via Causally Motivated Inference-Time Intervention cs.CL · 2026-04-30 · unverdicted · none · ref 14
Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning cs.CL · 2025-07-21 · unverdicted · none · ref 7
CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.

arXiv preprint arXiv:2409.17407

fields

years

verdicts

representative citing papers

citing papers explorer