RNNs can sustain power-law forgetting and multi-time-scale learning when heavy-tailed fluctuations in SGD balance the collapse tendency toward short time scales, governed by a spectral exponent β.
Meanfieldanalysisofdeepneuralnetworks.Mathematics of Operations Research, 47(1):120–152, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Anti-Collapse Dynamics and the Emergence of Multi-Time-Scale Learning in Recurrent Neural Networks
RNNs can sustain power-law forgetting and multi-time-scale learning when heavy-tailed fluctuations in SGD balance the collapse tendency toward short time scales, governed by a spectral exponent β.