Gradient modifications before Adam inflate old-direction learning rates via the second-moment term, but routing modifications solely to the first moment with adaptive strength prevents collapse and yields 3.8-4.8 unit gains over baselines in 8- and 16-domain continual learning.
Splitlora: Balancing stability and plasticity in continual learning through gradient space splitting
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
iGSP uses implicit gradient subspace projection in two phases to enable efficient continual adaptation of vision-language models, claiming SOTA accuracy with 42.7% fewer trainable parameters and 86.9% less total parameter growth.
citing papers explorer
-
Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair
Gradient modifications before Adam inflate old-direction learning rates via the second-moment term, but routing modifications solely to the first moment with adaptive strength prevents collapse and yields 3.8-4.8 unit gains over baselines in 8- and 16-domain continual learning.
-
iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models
iGSP uses implicit gradient subspace projection in two phases to enable efficient continual adaptation of vision-language models, claiming SOTA accuracy with 42.7% fewer trainable parameters and 86.9% less total parameter growth.