Plasticity Loss in Deep Reinforcement Learning: A Survey

Timo Klein , Christoph Luther , Manus McAuliffe , Lukas Miklautz , Claudia Plant , Sebastian Tschiatschek

Authors on Pith no claims yet

classification 💻 cs.AI cs.LG

keywords plasticitylossdeeplearningreinforcementunderstandingabilityadapt

read the original abstract

Plasticity refers to a network's ability to adapt to changing data distributions, which is crucial for the successful training of deep reinforcement learning agents. Loss of plasticity causes performance plateaus and contributes to scaling failures, overestimation bias, and insufficient exploration. To deepen the understanding of plasticity loss, we propose a unified definition, examine its drivers and pathologies, and organize over 50 mitigation strategies into the first comprehensive taxonomy of the field. Our analysis shows gaps in current evaluation practices and reveals that general regularization techniques often outperform domain-specific interventions. Future research should prioritize understanding the mechanisms underlying plasticity loss.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty derived from a Neural Tangent Kernel proxy for spectral plasticity to Mixture-of-Experts policies, raising average success rates by 133% on MetaWorld and 50% on HumanoidBench in conti...
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
Safe Continual Reinforcement Learning in Non-stationary Environments
cs.LG 2026-04 unverdicted novelty 6.0

Safe continual RL methods face a fundamental tension between enforcing safety constraints and preventing catastrophic forgetting in non-stationary environments, with regularization providing only partial mitigation.