pith. sign in

Step-size optimization for continual learning

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 3

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

Anytime Training with Schedule-Free Spectral Optimization

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.

Revisiting Adam for Streaming Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

citing papers explorer

Showing 3 of 3 citing papers.

  • Learning to Forget: Continual Learning with Adaptive Weight Decay cs.LG · 2026-04-29 · unverdicted · none · ref 5

    FADE adapts per-parameter weight decay rates online via approximate meta-gradient descent to improve controlled forgetting over fixed decay in online tracking and streaming classification.

  • Anytime Training with Schedule-Free Spectral Optimization cs.LG · 2026-05-21 · unverdicted · none · ref 16

    SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.

  • Revisiting Adam for Streaming Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 18

    C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.