CoT distillation frequently degrades student performance versus pre-distillation baselines, and capacity gap effects do not consistently dominate under a realistic protocol that includes original baselines.
InFindings of the Association for Computational Linguistics: ACL 2025, pages 15094–15119, Vienna, Austria
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
CoT distillation frequently degrades student performance versus pre-distillation baselines, and capacity gap effects do not consistently dominate under a realistic protocol that includes original baselines.