Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
arXiv preprint arXiv:2503.19859 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A low-rank Gaussian mixture model shows that training task diversity measured by non-overlapping subspace columns improves ICL generalization and shortens learning plateaus for linear attention, with empirical extension to nonlinear cases.
A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.
citing papers explorer
-
Pointwise Generalization in Deep Neural Networks
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
-
The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces
A low-rank Gaussian mixture model shows that training task diversity measured by non-overlapping subspace columns improves ICL generalization and shortens learning plateaus for linear attention, with empirical extension to nonlinear cases.
-
Spectral Condition for $\mu$P under Width-Depth Scaling
A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.