FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning

Yujie Feng , Hao Wang , Jian Li , Xu Chu , Zhaolu Kang , Yiran Liu , Yasha Wang , Philip S. Yu

show 1 more author

Xiao-Ming Wu

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.CL

keywords forgettingreplayforevermodelcurve-inspiredlearningmemorycatastrophic

0 comments

read the original abstract

Continual learning (CL) for large language models (LLMs) aims to enable sequential knowledge acquisition without catastrophic forgetting. Memory replay methods are widely used for their practicality and effectiveness, but most rely on fixed, step-based heuristics that often misalign with the model's actual learning progress, since identical training steps can result in varying degrees of parameter change. Motivated by recent findings that LLM forgetting mirrors the Ebbinghaus human forgetting curve, we propose FOREVER (FORgEtting curVe-inspired mEmory Replay), a novel CL framework that aligns replay schedules with a model-centric notion of time. FOREVER defines model time using the magnitude of optimizer updates, allowing forgetting curve-inspired replay intervals to align with the model's internal evolution rather than raw training steps. Building on this approach, FOREVER incorporates a forgetting curve-based replay scheduler to determine when to replay and an intensity-aware regularization mechanism to adaptively control how to replay. Extensive experiments on three CL benchmarks and models ranging from 0.6B to 13B parameters demonstrate that FOREVER consistently mitigates catastrophic forgetting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems
cs.AI 2026-04 unverdicted novelty 7.0

SuperLocalMemory V3.3 implements a cognitive memory taxonomy with mathematical forgetting and multi-channel retrieval, reaching 70.4% on LoCoMo in zero-LLM mode.
Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
cs.LG 2026-05 unverdicted novelty 6.0

Forgetting in LLM continual post-training is a geometry conflict between task-induced covariance structures and the evolving model state, controlled by gating Wasserstein barycenter merging on measured conflict.
Not All Memories Age the Same: Autodiscovery of Adaptive Decay in Knowledge Graphs
cs.IR 2026-04 unverdicted novelty 6.0

Knowledge graphs should use data-driven hierarchical decay surfaces based on velocity and volatility instead of uniform forgetting curves to better identify currently relevant facts.