Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D

Training language models to follow instructions with human feedback

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models

cs.CL · 2025-10-18 · unverdicted · novelty 5.0

RL post-trained models show stronger awareness of learned policies and better generalization to new tasks than SFT models, but display weaker alignment between internal reasoning traces and final outputs, especially under GRPO.

citing papers explorer

Showing 1 of 1 citing paper.

Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models cs.CL · 2025-10-18 · unverdicted · none · ref 5
RL post-trained models show stronger awareness of learned policies and better generalization to new tasks than SFT models, but display weaker alignment between internal reasoning traces and final outputs, especially under GRPO.

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D

fields

years

verdicts

representative citing papers

citing papers explorer