Next,we will study the dynamics of the non-zero singular values (Lemma A.11 Springer et al

=σi(t)−2ησi(t)(σi(t)2−(σspec,i)2) + 2ηλ(σi(t)2−σi(0)2)(2) As a result, note that whenσ(un)mixed i (0) = 0,σ(un)mixed i (t) = 0for allt · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-05-12 · conditional · novelty 6.0

Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.

Showing 1 of 1 citing paper.

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning cs.LG · 2026-05-12 · conditional · none · ref 25
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.