The synergy dilemma of long-cot sft and rl: Investigating post-training techniques for reasoning vlms

· 2025 · arXiv 2507.07562

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

cs.LG · 2026-01-12 · unverdicted · novelty 6.0

SFT and RL cannot be decoupled in LLM post-training because each step increases the loss or lowers the reward of the prior step under KL and PL analyses.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

citing papers explorer

Showing 2 of 2 citing papers.

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training cs.LG · 2026-01-12 · unverdicted · none · ref 3
SFT and RL cannot be decoupled in LLM post-training because each step increases the loss or lowers the reward of the prior step under KL and PL analyses.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 85
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

The synergy dilemma of long-cot sft and rl: Investigating post-training techniques for reasoning vlms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer