Computational-Statistical Gaps in Gaussian Single-Index Models

Alex Damian; Jason D. Lee; Joan Bruna; Loucas Pillaud-Vivien

arxiv: 2403.05529 · v2 · pith:HPOT5QEOnew · submitted 2024-03-08 · 💻 cs.LG · stat.ML

Computational-Statistical Gaps in Gaussian Single-Index Models

Alex Damian , Loucas Pillaud-Vivien , Jason D. Lee , Joan Bruna This is my paper

classification 💻 cs.LG stat.ML

keywords starstatisticalclasscomplexitygenerativehigh-dimensionalmodelssample

0 comments

read the original abstract

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$ samples, where $k^\star$ is a "generative" exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we provide examples of smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Statistically and Computationally Optimal Estimation and Inference of Common Subspaces
math.ST 2026-06 unverdicted novelty 7.0

Establishes statistical and computational optimality thresholds for common subspace estimation and inference under varying SNR regimes, including an impossibility result for adaptive confidence intervals below strong ...
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
stat.ML 2026-05 unverdicted novelty 7.0

In two-layer networks, weak-to-strong training elicits the target feature direction from pre-trained subspaces and preserves correlated off-target features, unlike standard fine-tuning.
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
cs.CL 2026-01 unverdicted novelty 7.0

Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic as...
A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
stat.ML 2026-05 conditional novelty 6.0

Neural networks prioritize amplitude over phase in Fourier space during training on translation-invariant data; power-law spectra accelerate phase learning despite not aiding classification.
Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks
cond-mat.dis-nn 2026-04 unverdicted novelty 6.0

In overparameterized quadratic networks, one-pass SGD escapes generalization plateaus only modestly faster and selects the initialization-closest zero-loss solution due to a conserved quantity in the overlap ODEs.
There Will Be a Scientific Theory of Deep Learning
stat.ML 2026-04 unverdicted novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...