Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
Unveiling the key factors for distilling chain-of-thought reasoning.ArXiv, abs/2502.18001
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
COMPACT adaptively fuses multi-teacher CoT supervisions using graph-based consensus, mutual-information adaptability, and loss-based difficulty metrics to improve small language model reasoning performance while mitigating catastrophic forgetting.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
Fine-Tuning Small Reasoning Models for Quantum Field Theory
Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
-
"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework
COMPACT adaptively fuses multi-teacher CoT supervisions using graph-based consensus, mutual-information adaptability, and loss-based difficulty metrics to improve small language model reasoning performance while mitigating catastrophic forgetting.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.