For a fixed data budget in LLM supervised fine-tuning, optimal data difficulty shifts toward harder examples as the budget grows because of the tradeoff between in-distribution generalization gap and extrapolation gap.
Sft doesn't always hurt general capabilities: Revisiting domain-specific fine-tuning in llms
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
Anchored Learning stabilizes LLM supervised fine-tuning by interpolating a moving anchor between the current model and a frozen reference to create bounded local updates in distribution space.
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
citing papers explorer
-
Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
For a fixed data budget in LLM supervised fine-tuning, optimal data difficulty shifts toward harder examples as the budget grows because of the tradeoff between in-distribution generalization gap and extrapolation gap.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control
Anchored Learning stabilizes LLM supervised fine-tuning by interpolating a moving anchor between the current model and a frozen reference to create bounded local updates in distribution space.
-
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.