Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
Annual Conference Computational Learning Theory , year =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Repeating smaller datasets speeds up training via sampling biases that enable appropriate layer-wise growth, leading to compute savings over larger datasets across tasks and architectures.
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
citing papers explorer
-
A Theory of Saddle Escape in Deep Nonlinear Networks
Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
-
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
Repeating smaller datasets speeds up training via sampling biases that enable appropriate layer-wise growth, leading to compute savings over larger datasets across tasks and architectures.
-
The two clocks and the innovation window: When and how generative models learn rules
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.