Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
arXiv preprint arXiv:2104.03298 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
Proves approximate Gaussianity of debiased linear forms of eigenvectors in matrix denoising and spiked PCA models under Gaussian noise, then constructs bias/variance estimators yielding minimax-optimal confidence intervals without sample splitting.
citing papers explorer
-
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
-
Statistical Inference for Linear Functions of Eigenvectors with Small Eigengaps
Proves approximate Gaussianity of debiased linear forms of eigenvectors in matrix denoising and spiked PCA models under Gaussian noise, then constructs bias/variance estimators yielding minimax-optimal confidence intervals without sample splitting.