Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.
Next,we will study the dynamics of the non-zero singular values (Lemma A.11 Springer et al
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Early Data Exposure Improves Robustness to Subsequent Fine-Tuning
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.