Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
LoRA learns less and forgets less.Transactions on Machine Learning Research, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.
citing papers explorer
-
Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
-
LoRA vs. Full Fine-Tuning: A Theoretical Perspective
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.