High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile

Camille Male; Issa-Mbenard Dabo; J\'er\'emie Bigot

arxiv: 2403.20200 · v5 · pith:YGBJK25Bnew · submitted 2024-03-29 · 🧮 math.ST · math.PR· stat.ME· stat.ML· stat.TH

High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile

J\'er\'emie Bigot , Issa-Mbenard Dabo , Camille Male This is my paper

Pith reviewed 2026-05-24 02:44 UTC · model grok-4.3

classification 🧮 math.ST math.PRstat.MEstat.MLstat.TH

keywords high-dimensional regressionridge estimatorvariance profiledeterministic equivalentsdouble descentrandom matrix theorypredictive risknon-identically distributed data

0 comments

The pith

Ridge regression on data with a variance profile admits deterministic equivalents for predictive risk and degrees of freedom.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies high-dimensional linear regression where predictors follow a variance profile rather than being identically distributed. Under a random effects model it derives deterministic equivalents for the ridge estimator's predictive risk and its degrees of freedom. These equivalents are obtained via random matrix theory adapted to variance profiles. For some profiles the risk of the minimum-norm least-squares estimator exhibits double descent as the regularization parameter tends to zero; for other profiles the risk curve takes a different shape.

Core claim

Assuming a random effect model, the predictive risk of the ridge estimator and its degrees of freedom admit deterministic equivalents when the data matrix has a variance profile and dimensions grow proportionally. For certain classes of variance profiles, the minimum norm least-squares estimator (ridge parameter to zero) shows double descent in the predictive risk, while other profiles yield different risk shapes.

What carries the argument

The variance profile of the random predictor matrix, analyzed through random matrix theory results that handle non-identical variances.

If this is right

The deterministic equivalents allow exact high-dimensional computation of ridge risk without Monte Carlo simulation.
Double descent appears in the minimum-norm estimator for some non-iid variance profiles.
Certain variance profiles produce predictive-risk curves that do not follow the double-descent shape.
The same random-matrix machinery can be applied to other linear estimators beyond ridge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The formulas could be inverted to choose the ridge parameter that minimizes risk for a given estimated variance profile.
Similar deterministic equivalents might be derived for generalized linear models or kernel ridge regression under the same variance-profile assumption.
Real-data applications would require consistent estimation of the variance profile entries from the observed matrix.

Load-bearing premise

The observations follow a random effects model and the variance profile satisfies the moment and growth conditions required for the random matrix theory tools.

What would settle it

A direct numerical comparison in which the empirical risk of ridge regression on simulated data with a qualifying variance profile deviates from the deterministic equivalent by more than sampling error.

read the original abstract

High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives deterministic equivalents for ridge risk and degrees of freedom under a variance-profile covariate model and shows the risk curve shape depends on the profile.

read the letter

The paper's main result is a set of deterministic equivalents for the predictive risk of ridge regression and its degrees of freedom when the features have a variance profile. Under a random effects model and proportional dimension growth, the equivalents let the authors track how the risk behaves as a function of the regularization parameter and highlight that double descent appears for some profiles but not others when the ridge parameter goes to zero. The proofs rely on random matrix theory tools for variance profiles that the abstract says had not been applied to regression before. Numerical experiments are included to check that the equivalents track the finite-sample risk closely. This is a clean extension of the i.i.d. ridge analysis that has been standard in the literature. The technical conditions on the profile (moment bounds and growth rates) are the usual ones from the RMT side, so the scope is narrower than fully general non-i.i.d. data but still broader than the i.i.d. case. No circularity or internal inconsistency appears in the claims. The random effects assumption is standard for this type of risk calculation and keeps the derivations tractable. The work is aimed at researchers in high-dimensional statistics who want explicit risk formulas beyond i.i.d. assumptions or who study how covariate heterogeneity changes phenomena like double descent. A reader already comfortable with RMT in statistics will get the most from the explicit expressions and the profile comparisons. The combination of new application, explicit results, and numerical checks is enough to justify sending it to a serious referee rather than desk rejecting it.

Referee Report

0 major / 2 minor

Summary. The manuscript derives deterministic equivalents for the predictive risk and the degrees of freedom of the ridge estimator in high-dimensional linear regression under a random-effects model, where the design matrix has independent but non-identically distributed entries governed by a variance profile whose dimensions grow proportionally. It shows that, for certain classes of variance profiles satisfying the requisite technical conditions, the minimum-norm least-squares estimator exhibits the double-descent phenomenon as the ridge parameter tends to zero, while other profiles produce qualitatively different risk curves. The derivations rely on random-matrix-theory results for variance profiles that have not previously been applied to regression; numerical experiments are provided to illustrate the accuracy of the equivalents.

Significance. If the derivations hold under the stated conditions, the work extends the RMT analysis of ridge regression from the iid setting to a substantially more general class of heterogeneous data. The explicit dependence of risk shape on the variance profile, including both double-descent and non-double-descent regimes, supplies a concrete mechanism for understanding generalization behavior beyond the classical iid case. The application of previously unused RMT tools to regression and the provision of numerical validation are clear strengths.

minor comments (2)

[Abstract] Abstract: the phrase 'for certain class of variance profile' is imprecise; the introduction or Section 2 should explicitly reference the precise technical conditions (proportional growth, moment bounds) under which double descent is recovered.
[Introduction] The manuscript would benefit from a short table or figure in the main text that contrasts the risk curves for at least two concrete variance-profile families (one yielding double descent, one not) rather than relegating all examples to the numerical section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of the manuscript, including the accurate summary of the contributions and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivations rely on external RMT results

full rationale

The paper applies established random matrix theory tools for variance-profile matrices to derive deterministic equivalents for ridge risk and degrees of freedom under a random-effects model. These tools are cited as external (not previously applied to regression) and the results are explicitly conditional on proportional growth and moment conditions from prior RMT literature. No load-bearing self-citations, self-definitional steps, fitted inputs renamed as predictions, or ansatzes smuggled via citation appear in the abstract or stated claims. The central results (deterministic equivalents and double-descent emergence for certain profiles) remain independent of the paper's own fitted quantities or prior author work. This is the common case of a self-contained application of external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on a random effect model and on the existence of a variance profile whose entries allow application of previously unused random matrix theory tools. No free parameters or invented entities are mentioned in the abstract.

axioms (2)

domain assumption Random effect model for the regression coefficients
Stated in the abstract as the modeling framework under which the ridge risk is studied.
domain assumption Existence of a variance profile satisfying the conditions for the random matrix theory results
Required for the deterministic equivalents to hold; technical conditions not detailed in abstract.

pith-pipeline@v0.9.0 · 5774 in / 1297 out tokens · 14580 ms · 2026-05-24T02:44:38.723271+00:00 · methodology

High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)