pith. sign in

arxiv: 2606.01011 · v1 · pith:UPXE6KN2new · submitted 2026-05-31 · 🧮 math.ST · stat.ME· stat.TH

Semiparametric Efficiency of Residual Correlation Testing under Gaussian Additive Noise Models

Pith reviewed 2026-06-28 16:40 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH
keywords semiparametric efficiencyconditional independence testingGaussian additive noise modelresidual correlationPearson correlationasymptotic efficiencynonparametric regression
0
0 comments X

The pith

The ordinary residual Pearson correlation is exactly the semiparametrically efficient estimator under the Gaussian additive noise model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies tests for conditional independence when each of two responses is a nonlinear function of covariates plus a pair of independent Gaussian errors. Conditional independence holds precisely when those error terms are uncorrelated. The authors derive the semiparametric efficiency bound for estimating the error correlation and show that the simple Pearson correlation computed on the fitted residuals attains this bound exactly. They then give the asymptotic distribution of the resulting test statistic and an accompanying inference procedure that controls type I error while retaining near-optimal power.

Core claim

Under the Gaussian additive noise model, two responses are expressed as unknown nonlinear functions of covariates plus independent bivariate Gaussian errors; conditional independence is equivalent to zero correlation between the errors. The semiparametric efficient estimator of this correlation parameter is identical to the ordinary Pearson correlation of the residuals obtained after estimating the nonlinear functions. Asymptotic normality of the estimator follows at the efficient rate, yielding a practical test for conditional independence.

What carries the argument

Semiparametric efficiency bound for the error-correlation parameter in the Gaussian additive noise model, attained exactly by the residual Pearson correlation.

If this is right

  • The residual Pearson correlation test is asymptotically efficient for detecting conditional dependence under the stated model.
  • Standard central-limit theorem inference applies directly to the test statistic without additional nonparametric adjustment.
  • The procedure maintains valid type I error control and reaches near-oracle power in finite samples.
  • The same estimator can be applied to empirical conditional-dependence analysis such as stock-return data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the efficient estimator collapses to a simple residual correlation, more elaborate semiparametric procedures are unnecessary inside this model class.
  • The result supplies a concrete benchmark against which efficiency of tests under relaxed error assumptions can be compared.
  • If the Gaussian-error assumption is mildly violated, the same residual correlation may retain good power while losing its exact efficiency guarantee.

Load-bearing premise

The observations obey the Gaussian additive noise model in which each response equals an unknown nonlinear function of the covariates plus independent bivariate Gaussian errors.

What would settle it

A data set generated from the model but with non-Gaussian errors in which the residual Pearson correlation test rejects at a rate materially different from the rate achieved by a fully nonparametric test that does not assume Gaussianity.

Figures

Figures reproduced from arXiv: 2606.01011 by Bing Li, Yanyuan Ma, Yin Tang.

Figure 1
Figure 1. Figure 1: Boxplots of p-values of eight tests for Model 1 under the null hypothesis, with [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Boxplots of p-values of eight tests for Model 3 under the null hypothesis, with [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Heatmap of pairwise estimated Pearson correlations of residuals. AAPL MSFT NVDA GOOGL AMZN JPM BAC GS XOM CVX WMT COST [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
read the original abstract

This paper studies conditional independence testing under the Gaussian additive noise model (GANM), where two variables are modeled as nonlinear functions of covariates with independent bivariate Gaussian regression errors. Under this framework, conditional independence can be characterized by the correlation coefficient of the regression errors, which motivates a test based on the Pearson correlation coefficient computed from the fitted residuals. Despite its simple form, the asymptotic behavior and statistical efficiency of the resulting test have not been well understood. In this paper, we develop the semiparametric efficiency theory under GANM and show, surprisingly, that the efficient estimator coincides exactly with the ordinary residual Pearson correlation estimator. We further establish the asymptotic properties of the proposed test and develop the corresponding inference procedure. Simulation studies demonstrate that the proposed method achieves near-oracle efficiency and competitive empirical power while maintaining valid Type I error control. We further apply the proposed test to conditional dependence analysis of U.S. stock returns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper develops semiparametric efficiency theory under the Gaussian additive noise model (GANM), in which two responses are nonlinear functions of covariates plus independent bivariate Gaussian regression errors. It claims that the efficient estimator for the error correlation parameter coincides exactly with the ordinary residual Pearson correlation estimator, derives the corresponding efficient influence function and tangent space, establishes asymptotic normality and valid inference procedures for the resulting test, and supports the claims with simulations showing near-oracle efficiency plus an empirical application to U.S. stock returns.

Significance. If the central identification of the efficient estimator holds, the result is significant: it supplies a parameter-free justification for a computationally trivial procedure in a semiparametric setting and shows that the residual Pearson correlation attains the efficiency bound without further adjustment. The explicit construction of the tangent space and influence function under GANM is a concrete contribution to the literature on efficient inference for conditional dependence measures.

minor comments (3)
  1. [§3] §3, after Eq. (8): the definition of the nuisance tangent space for the nonparametric mean functions is stated but the explicit form of the projection operator onto the orthogonal complement is not written out; adding one displayed equation would improve readability.
  2. [Table 2] Table 2: the reported standard errors for the residual-correlation estimator are given to three decimal places while the oracle benchmark uses four; uniform formatting would aid direct comparison.
  3. [§5.2] §5.2: the statement that the test 'maintains valid Type I error control' is supported by the reported rejection rates, but the exact nominal level (e.g., 5 %) and the number of Monte Carlo replications should be restated in the table caption for completeness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our manuscript and the positive assessment of its significance. The recommendation for minor revision is noted. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper states the GANM as the modeling framework (nonlinear covariate functions plus independent bivariate Gaussian errors) and then derives the tangent space, efficient influence function, and semiparametric efficiency bound for the error correlation parameter under that model. It shows that the resulting efficient estimator equals the ordinary residual Pearson correlation. This is a standard semiparametric calculation whose conclusion is a derived equality rather than an input restated by definition; the match is presented as a non-obvious result. No self-citations, fitted parameters renamed as predictions, or ansatzes smuggled via prior work are indicated as load-bearing steps in the abstract or reader summary. The derivation is therefore self-contained against the stated model assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that data are generated from the Gaussian additive noise model with the stated error structure; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Data follow the Gaussian additive noise model with nonlinear covariate functions and independent bivariate Gaussian regression errors.
    This modeling framework is invoked as the setting in which conditional independence is characterized by error correlation and efficiency is derived.

pith-pipeline@v0.9.1-grok · 5688 in / 1152 out tokens · 31120 ms · 2026-06-28T16:40:43.530538+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 7 canonical work pages

  1. [1]

    Nonlinear causal discovery with additive noise models , volume =

    Hoyer, Patrik and Janzing, Dominik and Mooij, Joris M and Peters, Jonas and Sch\". Nonlinear causal discovery with additive noise models , volume =. Advances in Neural Information Processing Systems , editor =

  2. [2]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Causal Discovery Using Regression-Based Conditional Independence Tests , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2017 , month=. doi:10.1609/aaai.v31i1.10698 , number=

  3. [3]

    WIREs Computational Statistics , volume =

    Li, Chun and Fan, Xiaodan , title =. WIREs Computational Statistics , volume =. doi:https://doi.org/10.1002/wics.1489 , eprint =

  4. [4]

    2003 , publisher=

    An Introduction to Multivariate Statistical Analysis , author=. 2003 , publisher=

  5. [5]

    1996 , publisher=

    Graphical Models , author=. 1996 , publisher=

  6. [6]

    1982 , publisher=

    Aspects of Multivariate Statistical Theory , author=. 1982 , publisher=

  7. [7]

    Measuring Statistical Dependence with Hilbert-Schmidt Norms

    Gretton, Arthur and Bousquet, Olivier and Smola, Alex and Sch \"o lkopf, Bernhard. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory. 2005

  8. [8]

    A Kernel Statistical Test of Independence , volume =

    Gretton, Arthur and Fukumizu, Kenji and Teo, Choon and Song, Le and Sch\". A Kernel Statistical Test of Independence , volume =. Advances in Neural Information Processing Systems , editor =

  9. [9]

    Journal of Causal Inference , doi =

    Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery , author =. Journal of Causal Inference , doi =. 2019 , lastchecked =

  10. [10]

    Kernel-based conditional independence test and application in causal discovery , year =

    Zhang, Kun and Peters, Jonas and Janzing, Dominik and Sch\". Kernel-based conditional independence test and application in causal discovery , year =. Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence , pages =

  11. [11]

    2026 , note =

    dHSIC: Independence Testing via Hilbert Schmidt Independence Criterion , author =. 2026 , note =

  12. [12]

    2026 , note =

    RCIT: The Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) , author =. 2026 , note =

  13. [13]

    2025 , note =

    quantmod: Quantitative Financial Modelling Framework , author =. 2025 , note =

  14. [14]

    Mooij and Dominik Janzing and Bernhard Sch

    Jonas Peters and Joris M. Mooij and Dominik Janzing and Bernhard Sch. Causal Discovery with Continuous Additive Noise Models , journal =. 2014 , volume =

  15. [15]

    2018 , publisher=

    Sufficient Dimension Reduction: Methods and Applications with R , author=. 2018 , publisher=

  16. [16]

    International Statistical Review , volume =

    Ma, Yanyuan and Zhu, Liping , title =. International Statistical Review , volume =. doi:https://doi.org/10.1111/j.1751-5823.2012.00182.x , abstract =

  17. [17]

    2009 , publisher=

    Probabilistic Graphical Models: Principles and Techniques , author=. 2009 , publisher=

  18. [18]

    2024 , publisher=

    A First Course in Causal Inference , author=. 2024 , publisher=

  19. [19]

    2016 , publisher=

    Causal Inference in Statistics: A Primer , author=. 2016 , publisher=

  20. [20]

    A. P. Dawid , journal =. Conditional Independence in Statistical Theory , urldate =

  21. [21]

    2025 , note =

    SuperLearner: Super Learner Prediction , author =. 2025 , note =

  22. [22]

    2007 , doi =

    Super learner , author=. 2007 , doi =

  23. [23]

    and Rose, Sherri and van der Laan, Mark J

    Polley, Eric C. and Rose, Sherri and van der Laan, Mark J. Super Learning. Targeted Learning: Causal Inference for Observational and Experimental Data. 2011. doi:10.1007/978-1-4419-9782-1_3

  24. [24]

    A. P. Dempster , journal =. Covariance Selection , urldate =

  25. [25]

    Australian & New Zealand Journal of Statistics , volume =

    Baba, Kunihiro and Shibata, Ritei and Sibuya, Masaaki , title =. Australian & New Zealand Journal of Statistics , volume =. doi:https://doi.org/10.1111/j.1467-842X.2004.00360.x , abstract =

  26. [26]

    Causal Inference on Discrete Data Using Additive Noise Models , year=

    Peters, Jonas and Janzing, Dominik and Scholkopf, Bernhard , journal=. Causal Inference on Discrete Data Using Additive Noise Models , year=

  27. [27]

    Hoyer and Aapo Hyv\"arinen and Antti Kerminen , title =

    Shohei Shimizu and Patrik O. Hoyer and Aapo Hyv\"arinen and Antti Kerminen , title =. Journal of Machine Learning Research , year =

  28. [28]

    ACM Trans

    Zhang, Hao and Zhou, Shuigeng and Guan, Jihong and Huan, Jun (Luke) , title =. ACM Trans. Intell. Syst. Technol. , month = sep, articleno =. 2019 , issue_date =. doi:10.1145/3325708 , abstract =

  29. [29]

    The Annals of Statistics , number =

    Tzee-Ming Huang , title =. The Annals of Statistics , number =

  30. [30]

    A Nonparametric Hellinger Metric Test for Conditional Independence , volume=

    Su, Liangjun and White, Halbert , year=. A Nonparametric Hellinger Metric Test for Conditional Independence , volume=. Econometric Theory , publisher=

  31. [31]

    A Flexible Nonparametric Test for Conditional Independence , volume=

    Huang, Meng and Sun, Yixiao and White, Halbert , year=. A Flexible Nonparametric Test for Conditional Independence , volume=. Econometric Theory , publisher=

  32. [32]

    Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI2014) , pages =

    A Permutation-Based Kernel Conditional Independence Test , author =. Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI2014) , pages =

  33. [33]

    Journal of the American Statistical Association , Year =

    Xueqin Wang and Wenliang Pan and Wenhao Hu and Yuan Tian and Heping Zhang , title =. Journal of the American Statistical Association , volume =. 2015 , publisher =. doi:10.1080/01621459.2014.993081 , note =

  34. [34]

    Journal of Machine Learning Research , year =

    Zhanrui Cai and Runze Li and Yaowu Zhang , title =. Journal of Machine Learning Research , year =

  35. [35]

    and Jordan, Michael I

    Fukumizu, Kenji and Bach, Francis R. and Jordan, Michael I. , title =. J. Mach. Learn. Res. , month =. 2004 , issue_date =

  36. [36]

    Kernel Measures of Conditional Dependence , volume =

    Fukumizu, Kenji and Gretton, Arthur and Sun, Xiaohai and Sch\". Kernel Measures of Conditional Dependence , volume =. Advances in Neural Information Processing Systems , editor =

  37. [37]

    Sriperumbudur , title =

    Tianhong Sheng and Bharath K. Sriperumbudur , title =. Journal of Machine Learning Research , year =

  38. [38]

    2026 , eprint=

    A Kernel-Based Nonparametric Test for Conditional Independence of Functional Data , author=. 2026 , eprint=

  39. [39]

    Lehmann, E. L. Elements of Large-Sample Theory. 1999

  40. [40]

    1993 , publisher=

    Efficient and Adaptive Estimation for Semiparametric Models , author=. 1993 , publisher=

  41. [41]

    2006 , publisher=

    Semiparametric Theory and Missing Data , author=. 2006 , publisher=