Model Checking for Regressions Based on Weighted Residual Processes with Diverging Number of Predictors

Haiqi Li; Xintao Xia; Yue Hu

arxiv: 2604.14649 · v1 · submitted 2026-04-16 · 📊 stat.ME · math.ST· stat.TH

Model Checking for Regressions Based on Weighted Residual Processes with Diverging Number of Predictors

Yue Hu , Haiqi Li , Xintao Xia This is my paper

Pith reviewed 2026-05-10 10:53 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords specification testweighted residual processhigh-dimensional regressiondiverging predictorsmodel checkingsmooth residual bootstrapregression mean function

0 comments

The pith

Weighted residual processes yield a valid specification test for parametric regressions when the number of predictors diverges with sample size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing integrated conditional moment tests lose their ability to distinguish models once the predictor count grows with the sample size, because their limiting distributions collapse to constants and standard wild bootstraps produce incorrect critical values. The paper replaces them with a test statistic constructed from weighted residual processes, which retains proper non-degenerate limits under the null and under local alternatives that approach the null at the usual parametric rate. Asymptotic theory shows the test keeps its nominal size and the authors supply a smooth residual bootstrap whose validity they prove for diverging dimension. Simulations and one real-data illustration confirm that the procedure works in finite samples where earlier methods fail. The result supplies a practical tool for checking whether a fitted mean function is correctly specified in modern high-dimensional data sets.

Core claim

The authors construct a test statistic from suitably weighted residual processes and prove that, under regularity conditions allowing the dimension p_n to diverge with n, the statistic converges to a non-degenerate limit under the null hypothesis, to infinity under global alternatives, and to a non-central limit under local alternatives at rate 1/sqrt(n). They further show that a smooth residual bootstrap consistently approximates the null distribution in this regime, thereby restoring valid inference for the adequacy of the parametric regression mean function.

What carries the argument

Weighted residual processes formed by multiplying fitted residuals by a weighting function and integrating the squared process to produce the test statistic.

Load-bearing premise

The dimension p_n must grow slowly enough relative to sample size n that the weighted residual process still has a non-degenerate limiting distribution.

What would settle it

Empirical distributions of the test statistic in Monte Carlo experiments with increasing p_n either collapse toward a constant under the null or the smooth bootstrap quantiles deviate systematically from the observed null distribution.

Figures

Figures reproduced from arXiv: 2604.14649 by Haiqi Li, Xintao Xia, Yue Hu.

**Figure 1.** Figure 1: (a) The scatter plot of Y against the fitted values βb⊤X, and (b) the scatter plot of the residuals against the fitted values βb⊤X for the Geographical Origin of Music data set. 6 Discussion Although widely used, the ICM test exhibits fundamentally different asymptotic behavior in high-dimensional settings, and the associated wild bootstrap is no longer valid. To address this issue, we propose a test based… view at source ↗

read the original abstract

The integrated conditional moment (ICM) test is a classical and widely used method for assessing the adequacy of regression models. Although it performs well in fixed-dimension settings, its behavior changes dramatically when the predictor dimension diverges: in such regimes, the limiting null and alternative distributions of the ICM statistic degenerate to fixed constants. Moreover, when the number of predictors diverges, the commonly used wild bootstrap no longer approximates the null distribution of the ICM statistic well, leading to size distortion and substantial power loss. To address these challenges, we propose a new specification test based on weighted residual processes for evaluating the parametric form of the regression mean function in high-dimensional settings where the number of predictors increases with the sample size. We establish the asymptotic properties of the test statistic under the null hypothesis and under global and local alternatives. The proposed test maintains the nominal significance level and can detect local alternatives that deviate from the null hypothesis at the parametric rate $1/\sqrt{n}$. Furthermore, we propose a smooth residual bootstrap to approximate the limiting null distribution and establish its validity in high-dimensional settings. Two simulation studies and a real-data example are conducted to evaluate the finite-sample performance of the proposed test.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a usable specification test for parametric regression when p diverges with n by switching to weighted residual processes plus a smooth residual bootstrap, but the local-alternative power claim at 1/sqrt(n) looks vulnerable to OLS contamination unless p grows slowly enough.

read the letter

The main thing to know is that this work targets a real breakdown: standard ICM tests and wild bootstrap lose their limiting distributions and size control once the predictor count grows with sample size. The authors replace the usual integrated conditional moment with a weighted residual process and pair it with a smooth residual bootstrap that they show is consistent under the null in the diverging-p regime. They also derive the behavior under global and local alternatives and run simulations plus a real-data check to see how it behaves in practice. That combination is not a routine extension of fixed-p methods, and it directly tackles the degeneration the abstract describes. The finite-sample results appear to support correct size and reasonable power, which is the part that would matter most to someone who actually needs to validate a model. The soft spot is the local-alternative claim. Under a local deviation of size 1/sqrt(n), the OLS residuals contain an extra term whose size depends on the inverse Gram matrix and the design. When p diverges, even slowly, small eigenvalues can make that term comparable to or larger than the signal, so the non-centrality can disappear. The abstract asserts detection at the parametric rate without stating the extra restriction on p_n (for example p_n = o(n^{1/4}) or similar) that would keep the estimation error negligible. If the full proofs impose and verify such a rate, the result holds; otherwise the practical range is narrower than advertised. The regularity conditions are also left at a high level in the abstract. This is for applied statisticians or econometricians who work with moderately high-dimensional parametric regressions and need a model-check that does not immediately break. It is worth sending to referees because the problem is concrete, the proposed fix is specific, and the supporting simulations exist, even if the rate conditions and proofs need close checking.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a specification test for the parametric form of the regression mean function in high-dimensional regimes where the number of predictors p_n diverges with sample size n. The test is based on weighted residual processes; the authors derive its asymptotic null distribution, establish consistency against global alternatives, and claim that it has nontrivial power against local alternatives that deviate from the null at the parametric rate 1/√n. They further introduce a smooth residual bootstrap to approximate the null distribution and prove its validity under the same high-dimensional asymptotics. Finite-sample behavior is illustrated with two simulation studies and one real-data example.

Significance. If the asymptotic claims hold under explicitly stated conditions on p_n, the work would fill a genuine gap: standard ICM tests degenerate when p_n → ∞ and the usual wild bootstrap loses validity, while the proposed weighted-process construction and smooth bootstrap appear to restore non-degenerate limits and correct size. The asserted local-power rate of 1/√n would be a strong theoretical feature for high-dimensional model checking.

major comments (1)

[Abstract and local-alternative result] Abstract and the local-alternative theorem (presumably §4 or §5): the claim that the test detects local alternatives H_n : m(x) = x'β_0 + δ_n(x) with ||δ_n|| ~ 1/√n at the parametric rate is load-bearing for the paper's main contribution, yet the abstract invokes only “unspecified regularity conditions and rates for the diverging dimension p_n.” The OLS estimation error term (X'X)^{-1}X'δ_n can be larger than O_p(1/√n) once the smallest eigenvalues of the Gram matrix deteriorate with p_n, potentially driving the non-centrality parameter to zero or infinity. The manuscript must state the precise growth restriction on p_n (e.g., p_n = o(√n) or p_n log p_n = o(n)) that keeps this contamination negligible and verify that the stated local-power result continues to hold under that restriction.

minor comments (2)

[Abstract] The abstract refers to “two simulation studies” but does not indicate the concrete sequences (n, p_n) examined or the specific forms of the local alternatives; these details should be added for reproducibility.
[Abstract] Notation for the weight function and the smoothing parameter in the residual bootstrap should be introduced once and used consistently; currently the abstract leaves both unspecified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The major comment raises an important point about the explicit growth conditions on p_n needed to support the local-alternative power claim. We address this directly below.

read point-by-point responses

Referee: [Abstract and local-alternative result] Abstract and the local-alternative theorem (presumably §4 or §5): the claim that the test detects local alternatives H_n : m(x) = x'β_0 + δ_n(x) with ||δ_n|| ~ 1/√n at the parametric rate is load-bearing for the paper's main contribution, yet the abstract invokes only “unspecified regularity conditions and rates for the diverging dimension p_n.” The OLS estimation error term (X'X)^{-1}X'δ_n can be larger than O_p(1/√n) once the smallest eigenvalues of the Gram matrix deteriorate with p_n, potentially driving the non-centrality parameter to zero or infinity. The manuscript must state the precise growth restriction on p_n (e.g., p_n = o(√n) or p_n log p_n = o(n)) that keeps this contamination negligible and verify that the stated local-power result continues to hold under that restriction.

Authors: We thank the referee for highlighting this subtlety in the local-alternative analysis. The local-power theorem (Theorem 4.2) is derived under a set of regularity conditions (Assumptions 2.1–2.3 and 3.1) that explicitly include the restriction p_n = o(n^{1/2}) to ensure that the OLS estimation error remains o_p(n^{-1/2}) and does not inflate or deflate the non-centrality parameter. Under this rate the smallest eigenvalues of the Gram matrix are controlled so that the contamination term is negligible. We acknowledge, however, that the abstract refers only to “unspecified regularity conditions.” We will revise the abstract to state the growth restriction explicitly (e.g., “under the condition p_n = o(√n)”) and will add a short remark immediately after Theorem 4.2 confirming that the 1/√n local-power result continues to hold under this restriction. These changes will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation of weighted residual process test

full rationale

The paper constructs a new ICM-style test from weighted residual processes after OLS fitting, derives limiting null/alternative distributions, and validates a smooth residual bootstrap under diverging p_n. These steps rely on standard empirical process arguments and regularity conditions rather than reducing by construction to the paper's own fitted quantities or self-citations. The local-alternative power claim at rate 1/sqrt(n) is an external asymptotic result, not a tautology. No self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns appear.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard high-dimensional asymptotic regularity conditions (moments, rates of p_n/n, smoothness of the mean function) that are typical domain assumptions rather than new inventions or fitted parameters.

axioms (1)

domain assumption Standard regularity conditions on the regression model, predictors, and error terms that permit non-degenerate limits for the weighted residual process when dimension diverges
Invoked to establish the limiting null and alternative distributions and bootstrap consistency.

pith-pipeline@v0.9.0 · 5513 in / 1339 out tokens · 47424 ms · 2026-05-10T10:53:12.081934+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Bierens, H. J. (1982). Consistent model specification tests.Journal of Econometrics, 20(1):105–134. Bierens, H. J. (1990). A consistent conditional moment test of functional form.Economet- rica, 58(6):1443–1458. Bierens, H. J. and Ploberger, W. (1997). Asymptotic theory of integrated conditional moment tests.Econometrica, 65(5):1129–1151. Cook, R. D. (200...

work page 1982
[2]

and Lavergne, P

Guerre, E. and Lavergne, P. (2005). Data-driven rate-optimal specification testing in re- gression models.The Annals of Statistics, 33(2):840–870. Guo, X., Wang, T., and Zhu, L. (2016). Model checking for parametric single-index models: a dimension reduction model-adaptive approach.Journal of the Royal Statistical Society Series B: Statistical Methodology...

work page 2005
[3]

Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143):8. 33 Horowitz, J. L. and H¨ ardle, W. (1994). Testing a parametric model against a semipara- metric alternative.Econometric theory, 10(5):821–848. Khmaladze, E. V. and Koul, H. L. (2004). Martingale transfo...

work page 2015
[4]

Van Keilegom, I., Gonz´ alez Manteiga, W., and S´ anchez Sellero, C

Cambridge university press. Van Keilegom, I., Gonz´ alez Manteiga, W., and S´ anchez Sellero, C. (2008). Goodness-of-fit tests in parametric regression based on the estimation of the error distribution.Test, 17:401–415. Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation techniques.Journal of Econometrics, 75(2):263–289....

work page 2008

[1] [1]

Bierens, H. J. (1982). Consistent model specification tests.Journal of Econometrics, 20(1):105–134. Bierens, H. J. (1990). A consistent conditional moment test of functional form.Economet- rica, 58(6):1443–1458. Bierens, H. J. and Ploberger, W. (1997). Asymptotic theory of integrated conditional moment tests.Econometrica, 65(5):1129–1151. Cook, R. D. (200...

work page 1982

[2] [2]

and Lavergne, P

Guerre, E. and Lavergne, P. (2005). Data-driven rate-optimal specification testing in re- gression models.The Annals of Statistics, 33(2):840–870. Guo, X., Wang, T., and Zhu, L. (2016). Model checking for parametric single-index models: a dimension reduction model-adaptive approach.Journal of the Royal Statistical Society Series B: Statistical Methodology...

work page 2005

[3] [3]

Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143):8. 33 Horowitz, J. L. and H¨ ardle, W. (1994). Testing a parametric model against a semipara- metric alternative.Econometric theory, 10(5):821–848. Khmaladze, E. V. and Koul, H. L. (2004). Martingale transfo...

work page 2015

[4] [4]

Van Keilegom, I., Gonz´ alez Manteiga, W., and S´ anchez Sellero, C

Cambridge university press. Van Keilegom, I., Gonz´ alez Manteiga, W., and S´ anchez Sellero, C. (2008). Goodness-of-fit tests in parametric regression based on the estimation of the error distribution.Test, 17:401–415. Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation techniques.Journal of Econometrics, 75(2):263–289....

work page 2008