Optimization readiness, defined from gradient strength and reliability, lower-bounds one-step optimization gain and outperforms rank-based diagnostics in predicting neural network trainability across continual learning settings.
Maintaining plasticity in deep continual learning
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
A 2x growth factor in model warmstarting yields reliable training speedups for language models under 20 tokens/parameter budgets, with an empirical upper bound on effective growth factors.
Adam's adaptive preconditioning and first-moment averaging improve high-probability tracking error in noise-dominated nonstationary regimes but can increase it under strong drift, where SGD achieves a smaller floor, with explicit beta-dependent bounds.
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
citing papers explorer
-
Predicting Plasticity in Deep Continual Learning: A Theoretical Perspective
Optimization readiness, defined from gradient strength and reliability, lower-bounds one-step optimization gain and outperforms rank-based diagnostics in predicting neural network trainability across continual learning settings.
-
When is Warmstarting Effective for Scaling Language Models?
A 2x growth factor in model warmstarting yields reliable training speedups for language models under 20 tokens/parameter budgets, with an empirical upper bound on effective growth factors.
-
Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization
Adam's adaptive preconditioning and first-moment averaging improve high-probability tracking error in noise-dominated nonstationary regimes but can increase it under strong drift, where SGD achieves a smaller floor, with explicit beta-dependent bounds.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.
-
Plasticity Loss in Deep Reinforcement Learning: A Survey
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.