Introduces integration, metastability, and dynamical stability index measures from layer activations and reports patterns distinguishing CIFAR-10 from CIFAR-100 difficulty plus early convergence signals across ResNet variants, DenseNet, MobileNetV2, VGG-16, and a Vision Transformer.
On large-batch training for deep learning: Generalization gap and sharp minima
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Experiments indicate that small-batch SGD promotes flatter loss minima and better generalization in overparameterized networks, and that sparse subnetworks can retain nearly full accuracy.
citing papers explorer
-
Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach
Introduces integration, metastability, and dynamical stability index measures from layer activations and reports patterns distinguishing CIFAR-10 from CIFAR-100 difficulty plus early convergence signals across ResNet variants, DenseNet, MobileNetV2, VGG-16, and a Vision Transformer.
-
Implicit Regularization and Generalization in Overparameterized Neural Networks
Experiments indicate that small-batch SGD promotes flatter loss minima and better generalization in overparameterized networks, and that sparse subnetworks can retain nearly full accuracy.