Optimal Ridge Regularization for Out-of-Distribution Prediction

Jin-Hong Du; Pratik Patil; Ryan J. Tibshirani

arxiv: 2404.01233 · v1 · pith:OC5ED7IBnew · submitted 2024-04-01 · 🧮 math.ST · cs.LG· stat.ML· stat.TH

Optimal Ridge Regularization for Out-of-Distribution Prediction

Pratik Patil , Jin-Hong Du , Ryan J. Tibshirani This is my paper

classification 🧮 math.ST cs.LGstat.MLstat.TH

keywords regularizationoptimalnegativeout-of-distributionridgetesttrainconditions

0 comments

read the original abstract

We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally-tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models
math.ST 2025-10 unverdicted novelty 7.0

Inflating the min-norm interpolator by a factor >1 reduces generalization error in linear regression with anisotropic covariances when d/n diverges to infinity.
Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models
cond-mat.dis-nn 2025-02 unverdicted novelty 6.0

Derives a novel two-point deterministic equivalence for random matrix resolvents to obtain unified asymptotics for SGD-trained linear regression, kernel regression, and random feature models.