pith. sign in

Title resolution pending

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AI 1 cs.LG 1

years

2025 2

representative citing papers

RL Fine-Tuning Heals OOD Forgetting in SFT

cs.LG · 2025-09-08 · conditional · novelty 6.0

SFT on LLMs causes OOD reasoning to peak early then decline while ID improves; RL recovers the lost OOD performance from specific SFT checkpoints, with the pattern correlating to rotations in singular vectors of model weights.

citing papers explorer

Showing 2 of 2 citing papers.

  • RL Fine-Tuning Heals OOD Forgetting in SFT cs.LG · 2025-09-08 · conditional · none · ref 14

    SFT on LLMs causes OOD reasoning to peak early then decline while ID improves; RL recovers the lost OOD performance from specific SFT checkpoints, with the pattern correlating to rotations in singular vectors of model weights.

  • SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training cs.AI · 2025-01-28 · unverdicted · none · ref 32

    Reinforcement learning post-training enables generalization to unseen textual rule variants and visual changes in foundation models, while supervised fine-tuning primarily leads to memorization.