Computational-Statistical Gaps in Gaussian Single-Index Models
read the original abstract
Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$ samples, where $k^\star$ is a "generative" exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we provide examples of smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
Statistically and Computationally Optimal Estimation and Inference of Common Subspaces
Establishes statistical and computational optimality thresholds for common subspace estimation and inference under varying SNR regimes, including an impossibility result for adaptive confidence intervals below strong ...
-
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
In two-layer networks, weak-to-strong training elicits the target feature direction from pre-trained subspaces and preserves correlated off-target features, unlike standard fine-tuning.
-
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic as...
-
A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
Neural networks prioritize amplitude over phase in Fourier space during training on translation-invariant data; power-law spectra accelerate phase learning despite not aiding classification.
-
Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks
In overparameterized quadratic networks, one-pass SGD escapes generalization plateaus only modestly faster and selects the initialization-closest zero-loss solution due to a conserved quantity in the overlap ODEs.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.