Annual Conference Computational Learning Theory , year =

Emmanuel Abbe, Enric Boix-Adserà, Theodor Misiakiewicz · 2023 · arXiv 2302.11055

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Theory of Saddle Escape in Deep Nonlinear Networks

cs.LG · 2026-05-02 · unverdicted · novelty 8.0 · 3 refs

Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Repeating smaller datasets speeds up training via sampling biases that enable appropriate layer-wise growth, leading to compute savings over larger datasets across tasks and architectures.

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

citing papers explorer

Showing 3 of 3 citing papers.

A Theory of Saddle Escape in Deep Nonlinear Networks cs.LG · 2026-05-02 · unverdicted · none · ref 1 · 3 links
Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases cs.LG · 2026-05-19 · unverdicted · none · ref 23
Repeating smaller datasets speeds up training via sampling biases that enable appropriate layer-wise growth, leading to compute savings over larger datasets across tasks and architectures.
The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 24
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

Annual Conference Computational Learning Theory , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer