On-Policy Replay filters model rollouts on historical prompts by task reward and replays them as ordinary SFT examples, reducing backward transfer degradation on the TRACE benchmark across three 7-8B models.
Gere: Towards efficient anti-forgetting in continual learning of llm via general samples replay
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.LG 2years
2026 2roles
baseline 1polarities
baseline 1representative citing papers
Forgetting in LLM continual post-training is a geometry conflict between task-induced covariance structures and the evolving model state, controlled by gating Wasserstein barycenter merging on measured conflict.
citing papers explorer
-
Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
Forgetting in LLM continual post-training is a geometry conflict between task-induced covariance structures and the evolving model state, controlled by gating Wasserstein barycenter merging on measured conflict.