RL post-trained models show stronger awareness of learned policies and better generalization to new tasks than SFT models, but display weaker alignment between internal reasoning traces and final outputs, especially under GRPO.
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models
RL post-trained models show stronger awareness of learned policies and better generalization to new tasks than SFT models, but display weaker alignment between internal reasoning traces and final outputs, especially under GRPO.