pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2025 1

verdicts

CONDITIONAL 1

representative citing papers

RL Fine-Tuning Heals OOD Forgetting in SFT

cs.LG · 2025-09-08 · conditional · novelty 6.0

SFT on LLMs causes OOD reasoning to peak early then decline while ID improves; RL recovers the lost OOD performance from specific SFT checkpoints, with the pattern correlating to rotations in singular vectors of model weights.

citing papers explorer

Showing 1 of 1 citing paper.

  • RL Fine-Tuning Heals OOD Forgetting in SFT cs.LG · 2025-09-08 · conditional · none · ref 18

    SFT on LLMs causes OOD reasoning to peak early then decline while ID improves; RL recovers the lost OOD performance from specific SFT checkpoints, with the pattern correlating to rotations in singular vectors of model weights.