FADE adapts per-parameter weight decay rates online via approximate meta-gradient descent to improve controlled forgetting over fixed decay in online tracking and streaming classification.
Reinitializing weights vs units for maintaining plasticity in neural networks
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.
The paper reframes agentic safety as an epistemic property defined by teachability—the capacity to preserve future corrective leverage—rather than a behavioral property of the current policy.
citing papers explorer
-
Learning to Forget: Continual Learning with Adaptive Weight Decay
FADE adapts per-parameter weight decay rates online via approximate meta-gradient descent to improve controlled forgetting over fixed decay in online tracking and streaming classification.
-
Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.
-
Agentic Safety is an Epistemic Property, Not a Behavioral One
The paper reframes agentic safety as an epistemic property defined by teachability—the capacity to preserve future corrective leverage—rather than a behavioral property of the current policy.