The Annals of Statistics , volume =

Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J Tibshirani, Surprises in high-dimensional ridgeless least squares interpolation, arXiv: · 1903 · arXiv 1903.08560

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Limitations of Lazy Training of Two-layers Neural Networks

stat.ML · 2019-06-21 · unverdicted · novelty 8.0

For quadratic targets in d dimensions, two-layer quadratic networks achieve lower risk when fully trained than in random features or neural tangent regimes if hidden units < d.

Fixed-order PCA: Theory for Overestimated Factor Models

math.ST · 2026-05-18 · unverdicted · novelty 7.0

Establishes asymptotic consistency of factor estimates and √T-normality in factor-augmented regressions for fixed R ≥ r using anisotropic local laws from random matrix theory.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

Proximal Estimation and Inference

math.ST · 2022-05-26 · unverdicted · novelty 6.0

A proximal operator framework unifies asymptotics and Oracle features for penalized estimators and yields new sqrt(n)-consistent Ridgeless-type estimators for linear regression.

Asymmetric Scaling Laws from Sparse Features

stat.ML · 2026-05-22 · unverdicted · novelty 5.0

A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

citing papers explorer

Showing 5 of 5 citing papers.

Limitations of Lazy Training of Two-layers Neural Networks stat.ML · 2019-06-21 · unverdicted · none · ref 20
For quadratic targets in d dimensions, two-layer quadratic networks achieve lower risk when fully trained than in random features or neural tangent regimes if hidden units < d.
Fixed-order PCA: Theory for Overestimated Factor Models math.ST · 2026-05-18 · unverdicted · none · ref 127
Establishes asymptotic consistency of factor estimates and √T-normality in factor-augmented regressions for fixed R ≥ r using anisotropic local laws from random matrix theory.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 103
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
Proximal Estimation and Inference math.ST · 2022-05-26 · unverdicted · none · ref 32
A proximal operator framework unifies asymptotics and Oracle features for penalized estimators and yields new sqrt(n)-consistent Ridgeless-type estimators for linear regression.
Asymmetric Scaling Laws from Sparse Features stat.ML · 2026-05-22 · unverdicted · none · ref 48
A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

The Annals of Statistics , volume =

fields

years

verdicts

representative citing papers

citing papers explorer