Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Chao Wang; Hexuan Deng; Min Zhang; Ruiyu Fang; Shuangyong Song; Shuo Nie; Xuebo Liu; Xuelong Li; Yu Li

arxiv: 2602.05897 · v2 · pith:GHAVRVZFnew · submitted 2026-02-05 · 💻 cs.CL

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Shuo Nie , Hexuan Deng , Chao Wang , Ruiyu Fang , Xuebo Liu , Shuangyong Song , Yu Li , Min Zhang

show 1 more author

Xuelong Li

This is my paper

classification 💻 cs.CL

keywords reasoningstep-levelfaithrllearningmodelsreinforcementrewardsfaithful

0 comments

read the original abstract

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes, while also mitigating reward hacking from step-level rewards. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

This paper has not been read by Pith yet.

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

discussion (0)