Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

Amy Greenwald; Arjun Prakash; George Konidaris; Kaicheng Guo; Naicheng He; Ruo Yu Tao; Saket Tiwari; Tyrone Serapio

arxiv: 2509.22335 · v3 · pith:EVW4KVLPnew · submitted 2025-09-26 · 💻 cs.LG · cs.AI

Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

Arjun Prakash , Naicheng He , Kaicheng Guo , Saket Tiwari , Ruo Yu Tao , Tyrone Serapio , Amy Greenwald , George Konidaris This is my paper

classification 💻 cs.LG cs.AI

keywords collapsecontinualhessianlearningplasticityspectralapproximationcurvature

0 comments

read the original abstract

We investigate why deep neural networks suffer from loss of plasticity in continual learning, and thus fail to learn new tasks without reinitializing parameters. We show that this failure is preceded by Hessian spectral collapse at new-task initialization, where meaningful curvature directions vanish and gradient descent becomes ineffective. Analyzing a linearized ReLU network, we derive explicit $\epsilon$-rank conditions for successful training and prove that the loss-weighted Gram matrix is spectrally equivalent to the Generalized Gauss-Newton approximation, thereby relating NTK dynamics to Hessian curvature. Targeting spectral collapse directly, we then discuss the Kronecker factored approximation of the Hessian, which motivates two regularization enhancements: maintaining high effective feature rank and applying L2 penalties. Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Predicting Plasticity in Deep Continual Learning: A Theoretical Perspective
cs.LG 2026-05 unverdicted novelty 7.0

Optimization readiness, defined from gradient strength and reliability, lower-bounds one-step optimization gain and outperforms rank-based diagnostics in predicting neural network trainability across continual learnin...
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty derived from a Neural Tangent Kernel proxy for spectral plasticity to Mixture-of-Experts policies, raising average success rates by 133% on MetaWorld and 50% on HumanoidBench in conti...
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.