Hypothesis Testing for Penalized Estimating Equations with Cross-Fitted Covariance Calibration

Jing Zhou; Zhe Zhang

arxiv: 2604.05055 · v1 · submitted 2026-04-06 · 📊 stat.ME · math.ST· stat.TH

Hypothesis Testing for Penalized Estimating Equations with Cross-Fitted Covariance Calibration

Jing Zhou , Zhe Zhang This is my paper

Pith reviewed 2026-05-10 19:16 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords penalized estimating equationshypothesis testingcross-fittingcovariance calibrationrobust inferencelongitudinal datahigh-dimensional regressionchi-squared asymptotics

0 comments

The pith

Penalized estimating equations support valid chi-squared tests on low-dimensional mean parameters even when the working covariance is misspecified, as long as the conditional mean model is correct.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that penalized estimating equations yield a sqrt(n)-consistent estimator for parameters of interest under correct specification of the conditional mean alone. The associated test statistic for a low-dimensional subvector converges in distribution to chi-squared, yet its non-centrality parameter still depends on the unknown nuisance covariance function. Cross-fitting supplies a consistent estimator of that covariance from held-out folds, removing the dependence and producing a calibrated test whose size and power do not require the working covariance to match the truth. The approach is motivated by settings such as longitudinal data or high-dimensional heteroscedastic regression where full distributional assumptions are impractical.

Core claim

Assuming the conditional mean model is correctly specified, penalized estimating equations admit a sqrt(n)-consistent solution even when the working covariance structure is misspecified. The test statistic for a low-dimensional subvector of the mean parameters converges to a chi-squared distribution whose asymptotic power depends on the nuisance covariance function. Estimating the covariance function via cross-fitting yields a calibrated and robust inference procedure.

What carries the argument

Cross-fitted covariance estimator, which computes the nuisance covariance on held-out data folds to decouple its estimation from the test statistic and thereby eliminates its influence from the limiting distribution.

Load-bearing premise

The conditional mean model must be correctly specified.

What would settle it

Simulate data from a model in which the conditional mean function is deliberately misspecified while the covariance structure remains fixed, then verify whether the empirical rejection rate of the proposed test under the null hypothesis fails to approach the nominal level as n grows.

read the original abstract

We study hypothesis testing for penalized estimators in settings where the full marginal distribution of a multivariate response is difficult to specify, such as longitudinal data with correlated measurements or high-dimensional heteroscedastic regression. Assuming that the conditional mean model is correctly specified, we establish that the penalized estimating equations admit a $\sqrt{n}$-consistent solution, even when the working covariance structure is misspecified. Our inferential target is a low-dimensional subvector of parameters associated with the mean model. We show that the resulting test statistic converges to a $\chi^2$ distribution, and that its asymptotic power depends on the nuisance covariance function. To mitigate this dependence, we propose estimating the covariance function via cross-fitting, which provides a calibrated and robust procedure for inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows sqrt(n) consistency for penalized estimating equations under correct mean but misspecified covariance, then uses cross-fitting to remove the covariance dependence from the chi-squared test statistic.

read the letter

The core contribution is a cross-fitting procedure that calibrates the covariance estimator so the test for a low-dimensional mean parameter converges to chi-squared without depending on the working covariance. They start from the penalized estimating equations, establish the sqrt(n) rate under the usual correct-mean assumption even when the covariance is wrong, derive the limiting distribution of the Wald-type statistic, and then replace the nuisance covariance with a cross-fitted version to eliminate that dependence. This is a direct but useful extension of standard estimating-equation results to the penalized setting for data like longitudinal or heteroscedastic responses where the full joint law is hard to specify. The cross-fitting step is the part that feels new and practically motivated. It keeps the procedure simple while addressing a real sensitivity in the asymptotics. The assumption that the conditional mean is correct is stated up front and is necessary for the consistency claim, so the method does not claim double robustness. With penalization present, the theory must control the penalty parameter so it does not distort the test statistic asymptotics, and cross-fitting adds choices such as fold number and how the covariance is estimated within folds. The abstract gives no simulation results, which leaves the finite-sample behavior and sensitivity to tuning open. A reader working on robust inference for penalized or semi-parametric models with dependent data would find the idea relevant. The technical steps appear to rest on standard arguments rather than circular reasoning, so the paper is worth sending to referees for a full check of the proofs and any numerical support.

Referee Report

0 major / 3 minor

Summary. The manuscript develops hypothesis testing procedures for penalized estimating equations applied to multivariate responses (e.g., longitudinal data or high-dimensional heteroscedastic regression) where the full marginal distribution is difficult to specify. Assuming correct specification of the conditional mean model, it establishes that the penalized estimating equations admit a √n-consistent solution even when the working covariance is misspecified. The paper derives that the test statistic for a low-dimensional subvector of the mean parameters converges to a χ² distribution whose asymptotic power depends on the nuisance covariance function, and proposes estimating this covariance via cross-fitting to obtain a calibrated, robust inference procedure.

Significance. If the asymptotic results hold, the work extends standard estimating-equation theory to penalized settings while using cross-fitting to mitigate dependence on the working covariance, yielding more reliable inference when covariance structures are hard to specify correctly. This is potentially valuable for biostatistical and econometric applications involving correlated or high-dimensional data, and aligns with double-robustness principles by orthogonalizing the inference step with respect to nuisance covariance estimation.

minor comments (3)

[Introduction] The abstract and introduction would benefit from a brief explicit statement of how the penalization term enters the estimating equations and whether it is treated as fixed or data-driven (e.g., via cross-validation).
[Section 3] Notation for the cross-fitted covariance estimator should be introduced earlier and kept consistent with the asymptotic expansion in the main theorems to improve readability.
[Section 5] The simulation section reports power curves but does not include coverage probabilities or type-I error rates under varying degrees of covariance misspecification; adding these would strengthen the empirical support for the calibration claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work, the accurate summary of the contributions, and the recommendation for minor revision. We are pleased that the potential value for applications involving correlated or high-dimensional data, as well as the connection to double-robustness ideas, is recognized.

Circularity Check

0 steps flagged

No significant circularity; derivation follows standard estimating-equation asymptotics

full rationale

The paper's central claims rest on the explicit assumption of correct conditional mean specification to obtain sqrt(n)-consistency for the penalized estimating equations solution (even under working covariance misspecification), followed by standard asymptotic expansion to show the test statistic converges to chi-squared with power depending on the nuisance covariance. Cross-fitting is then introduced as a calibration device to remove that dependence. None of these steps reduce by construction to fitted inputs, self-citations, or ansatzes imported from the authors' prior work; the derivation chain is self-contained against external benchmarks of estimating-equation theory and does not invoke uniqueness theorems or renamings that loop back to the paper's own quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central result depends on the assumption that the conditional mean model is correctly specified; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The conditional mean model is correctly specified
Explicitly stated as the key assumption enabling sqrt(n) consistency even under misspecified working covariance.

pith-pipeline@v0.9.0 · 5417 in / 1102 out tokens · 68677 ms · 2026-05-10T19:16:35.434862+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assuming that the conditional mean model is correctly specified, we establish that the penalized estimating equations admit a √n-consistent solution, even when the working covariance structure is misspecified.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Aitkin, A. C. (1935). On least squares and linear combination of observations. Proceedings of the Royal Society of Edinburgh , 55:42--48

work page 1935
[2]

Amemiya, T. (1973). Regression analysis when the variance of the dependent variable is proportional to the square of its expectation. Journal of the American Statistical Association , 68(344):928--934

work page 1973
[3]

Andrews, D. W. K. (1986). A note on the unbiasedness of feasible GLS , quasi-maximum likelihood, robust, adaptive, and spectral estimators of the linear model. Econometrica , 54(3):687--698

work page 1986
[4]

Bentkus, V. (2005). A Lyapunov -type bound in R ^d . Theory of Probability & Its Applications , 49(2):311--323

work page 2005
[5]

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal , 21(1):C1--C68

work page 2018
[6]

and Carroll, R

Davidian, M. and Carroll, R. J. (1987). Variance function estimation. Journal of the American Statistical Association , 82(400):1079--1091

work page 1987
[7]

and Li, R

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association , 96(456):1348--1360

work page 2001
[8]

X., Ning, Y., and Li, R

Fang, E. X., Ning, Y., and Li, R. (2020). Test of significance for high-dimensional longitudinal data. Annals of Statistics , 48(5):2622--2645

work page 2020
[9]

and Guillou, A

Gin \'e , E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Annales de l'Institut Henri Poincar\'e (B) Probability and Statistics , 38(6):907--921

work page 2002
[10]

Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. The Annals of Mathematical Statistics , 31(4):1208--1211

work page 1960
[11]

Godambe, V. P. (1985). The foundations of finite sample estimation in stochastic processes. Biometrika , 72(2):419--428

work page 1985
[12]

Guo, X., Li, R., Zhang, Z., and Zou, C. (2025). Model-free statistical inference on high-dimensional data. Journal of the American Statistical Association , 120(549):186--197

work page 2025
[13]

Guvenen, F. (2009). An empirical investigation of labor income processes. Review of Economic Dynamics , 12(1):58--79

work page 2009
[14]

Li, B. (2018). Sufficient dimension reduction: Methods and applications with R . Chapman and Hall/CRC

work page 2018
[15]

Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association , 86(414):316--327

work page 1991
[16]

and Zeger, S

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika , 73(1):13--22

work page 1986
[17]

MacKinnon, J. G. and White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics , 29(3):305--325

work page 1985
[18]

and van de Geer, S

Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. The Annals of Statistics , 25(3):1014--1035

work page 1997
[19]

and Nelder, J

McCullagh, P. and Nelder, J. A. (1989). Generalized linear models . Chapman and Hall, 2 edition

work page 1989
[20]

and Pistaferri, L

Meghir, C. and Pistaferri, L. (2004). Income variance dynamics and heterogeneity. Econometrica , 72(1):1--32

work page 2004
[21]

Nakagawa, S., Ortega, S., Gazzea, E., Lagisz, M., Lenz, A., Lundgren, E., and Mizuno, A. (2025). Location--scale models in ecology and evolution: Heteroscedasticity in continuous, count and proportion data. Methods in Ecology and Evolution

work page 2025
[22]

Ortega, J. M. and Rheinboldt, W. C. (2000). Iterative solution of nonlinear equations in several variables . SIAM

work page 2000
[23]

G., and Li, B

Qu, A., Lindsay, B. G., and Li, B. (2000). Improving generalized estimating equations using quadratic inference functions. Biometrika , 87(4):823--836

work page 2000
[24]

Shi, C., Song, R., Chen, Z., and Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models. Annals of Statistics , 47(5):2671--2703

work page 2019
[25]

and Stouli, S

Spady, R. and Stouli, S. (2018). Simultaneous mean-variance regression. arXiv:1804.01631

work page arXiv 2018
[26]

T., Sherlock, G., Zhang, M

Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle--regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell , 9(12):3273--3297

work page 1998
[27]

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics , 68(2):353--360

work page 2012
[28]

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss--Newton method. Biometrika , 61(3):439--447

work page 1974
[29]

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica , 48(4):817--838

work page 1980
[30]

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica , 50(1):1--25

work page 1982
[31]

Yin, J., Geng, Z., Li, R., and Wang, H. (2010). Nonparametric covariance model. Statistica Sinica , 20(1):469--479

work page 2010
[32]

Young, E. H. and Shah, R. D. (2024). Sandwich boosting for accurate estimation in partially linear models for grouped data. Journal of the Royal Statistical Society Series B: Statistical Methodology , 86(5):1286--1311

work page 2024
[33]

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics , 38(2):894--942

work page 2010

[1] [1]

Aitkin, A. C. (1935). On least squares and linear combination of observations. Proceedings of the Royal Society of Edinburgh , 55:42--48

work page 1935

[2] [2]

Amemiya, T. (1973). Regression analysis when the variance of the dependent variable is proportional to the square of its expectation. Journal of the American Statistical Association , 68(344):928--934

work page 1973

[3] [3]

Andrews, D. W. K. (1986). A note on the unbiasedness of feasible GLS , quasi-maximum likelihood, robust, adaptive, and spectral estimators of the linear model. Econometrica , 54(3):687--698

work page 1986

[4] [4]

Bentkus, V. (2005). A Lyapunov -type bound in R ^d . Theory of Probability & Its Applications , 49(2):311--323

work page 2005

[5] [5]

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal , 21(1):C1--C68

work page 2018

[6] [6]

and Carroll, R

Davidian, M. and Carroll, R. J. (1987). Variance function estimation. Journal of the American Statistical Association , 82(400):1079--1091

work page 1987

[7] [7]

and Li, R

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association , 96(456):1348--1360

work page 2001

[8] [8]

X., Ning, Y., and Li, R

Fang, E. X., Ning, Y., and Li, R. (2020). Test of significance for high-dimensional longitudinal data. Annals of Statistics , 48(5):2622--2645

work page 2020

[9] [9]

and Guillou, A

Gin \'e , E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Annales de l'Institut Henri Poincar\'e (B) Probability and Statistics , 38(6):907--921

work page 2002

[10] [10]

Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. The Annals of Mathematical Statistics , 31(4):1208--1211

work page 1960

[11] [11]

Godambe, V. P. (1985). The foundations of finite sample estimation in stochastic processes. Biometrika , 72(2):419--428

work page 1985

[12] [12]

Guo, X., Li, R., Zhang, Z., and Zou, C. (2025). Model-free statistical inference on high-dimensional data. Journal of the American Statistical Association , 120(549):186--197

work page 2025

[13] [13]

Guvenen, F. (2009). An empirical investigation of labor income processes. Review of Economic Dynamics , 12(1):58--79

work page 2009

[14] [14]

Li, B. (2018). Sufficient dimension reduction: Methods and applications with R . Chapman and Hall/CRC

work page 2018

[15] [15]

Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association , 86(414):316--327

work page 1991

[16] [16]

and Zeger, S

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika , 73(1):13--22

work page 1986

[17] [17]

MacKinnon, J. G. and White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics , 29(3):305--325

work page 1985

[18] [18]

and van de Geer, S

Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. The Annals of Statistics , 25(3):1014--1035

work page 1997

[19] [19]

and Nelder, J

McCullagh, P. and Nelder, J. A. (1989). Generalized linear models . Chapman and Hall, 2 edition

work page 1989

[20] [20]

and Pistaferri, L

Meghir, C. and Pistaferri, L. (2004). Income variance dynamics and heterogeneity. Econometrica , 72(1):1--32

work page 2004

[21] [21]

Nakagawa, S., Ortega, S., Gazzea, E., Lagisz, M., Lenz, A., Lundgren, E., and Mizuno, A. (2025). Location--scale models in ecology and evolution: Heteroscedasticity in continuous, count and proportion data. Methods in Ecology and Evolution

work page 2025

[22] [22]

Ortega, J. M. and Rheinboldt, W. C. (2000). Iterative solution of nonlinear equations in several variables . SIAM

work page 2000

[23] [23]

G., and Li, B

Qu, A., Lindsay, B. G., and Li, B. (2000). Improving generalized estimating equations using quadratic inference functions. Biometrika , 87(4):823--836

work page 2000

[24] [24]

Shi, C., Song, R., Chen, Z., and Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models. Annals of Statistics , 47(5):2671--2703

work page 2019

[25] [25]

and Stouli, S

Spady, R. and Stouli, S. (2018). Simultaneous mean-variance regression. arXiv:1804.01631

work page arXiv 2018

[26] [26]

T., Sherlock, G., Zhang, M

Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle--regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell , 9(12):3273--3297

work page 1998

[27] [27]

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics , 68(2):353--360

work page 2012

[28] [28]

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss--Newton method. Biometrika , 61(3):439--447

work page 1974

[29] [29]

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica , 48(4):817--838

work page 1980

[30] [30]

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica , 50(1):1--25

work page 1982

[31] [31]

Yin, J., Geng, Z., Li, R., and Wang, H. (2010). Nonparametric covariance model. Statistica Sinica , 20(1):469--479

work page 2010

[32] [32]

Young, E. H. and Shah, R. D. (2024). Sandwich boosting for accurate estimation in partially linear models for grouped data. Journal of the Royal Statistical Society Series B: Statistical Methodology , 86(5):1286--1311

work page 2024

[33] [33]

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics , 38(2):894--942

work page 2010