FADE adapts per-parameter weight decay rates online via approximate meta-gradient descent to improve controlled forgetting over fixed decay in online tracking and streaming classification.
Step-size optimization for continual learning
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.
citing papers explorer
-
Learning to Forget: Continual Learning with Adaptive Weight Decay
FADE adapts per-parameter weight decay rates online via approximate meta-gradient descent to improve controlled forgetting over fixed decay in online tracking and streaming classification.
-
Anytime Training with Schedule-Free Spectral Optimization
SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.
-
Revisiting Adam for Streaming Reinforcement Learning
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.