pith. sign in

arxiv: 2507.04962 · v2 · submitted 2025-07-07 · 📊 stat.ME

Covariance test for discretely observed functional data: when and how it works?

Pith reviewed 2026-05-19 06:34 UTC · model grok-4.3

classification 📊 stat.ME
keywords covariance testfunctional datadiscrete observationseigenfunction perturbationphase transitionFPCnonparametric testpool-smoothing
0
0 comments X

The pith

Covariance test for functional data stays valid for discretely observed noisy curves and matches full observation performance when sampling frequency scales with sample size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a statistical test to check if two sets of functional data have the same covariance operator, designed specifically for cases where the underlying curves are sampled at discrete points and contaminated with noise instead of being fully observed. Using a pool-smoothing approach, the test statistic is based on functional principal components where the number of components is allowed to increase with the sample size, leading to a nonparametric consistent test. The authors prove that the asymptotic distribution under the null hypothesis holds for a range of truncation levels thanks to bounds on how much the estimated eigenfunctions can deviate. Importantly, they identify a phase transition: once the number of discrete measurements per curve becomes large enough relative to the total sample size, the test's behavior is indistinguishable from the ideal case of continuous observations. This addresses a practical gap since most real functional data, such as longitudinal studies or sensor readings, come in discrete form.

Core claim

The central discovery is that the asymptotic null distribution of the FPC-based covariance test statistic constructed via pool-smoothing remains valid uniformly over permissible truncation levels even under discretized noisy observations, established using advancing perturbation bounds on the estimated eigenfunctions. Additionally, when the sampling frequency per subject is of a sufficiently large order relative to the sample size, the test achieves the same asymptotic properties as if the functional data were fully observed without discretization.

What carries the argument

Pool-smoothing strategy for constructing an FPC-based test statistic with diverging truncation level, justified by perturbation bounds on estimated eigenfunctions to handle errors from discretization and noise.

If this is right

  • The test can be applied directly to typical discretely observed functional datasets without additional assumptions on continuous observation.
  • The null distribution is asymptotically valid across a range of numbers of included eigenfunctions.
  • There is a critical sampling frequency threshold beyond which discretization effects vanish asymptotically for the test.
  • The method provides a consistent nonparametric test for covariance equality in functional data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This implies that covariance testing requires stricter sampling conditions than mere estimation of the covariance operator itself.
  • Experimenters designing studies with functional data should consider increasing the number of measurements per subject as sample size grows to maintain test validity.
  • The phase transition phenomenon may extend to other inference procedures in functional data analysis that rely on eigen-decompositions.

Load-bearing premise

The perturbation bounds on the estimated eigenfunctions are sufficiently tight to bound the error from simultaneously letting the truncation level diverge and using noisy discrete observations.

What would settle it

Simulate functional data with sampling frequency growing slower than the identified order relative to sample size n, apply the test with increasing truncation levels, and check if the empirical type I error rate exceeds the nominal level or fails to converge to the asymptotic distribution.

Figures

Figures reproduced from arXiv: 2507.04962 by Fang Yao, Jin Yang, Yang Zhou.

Figure 1
Figure 1. Figure 1: Empirical sizes and powers under Scenarios I and II (two s [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The rejection curve of Tpool seems reasonable comparing to the p-values obtained from the dataset itself, while TF H seems to contradict with the p-values in [PITH_FULL_IMAGE:figures/full_fig_p030_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: The rejection rates with increasing K of three statistics Tpool, TF H, TP K performed over 1000 random permutations of the whole dataset (left) and 1000 bootstrapped dataset (right), are depicted at the nominal significance level 0.05 (dashed line). tions potentially to grow with sample size. By investigating the asymptotic behaviour of a non-standardized statistic under the infinite-dimensional framework,… view at source ↗
read the original abstract

For covariance test in functional data analysis, existing methods are developed only for fully observed curves, whereas in practice, trajectories are typically observed discretely and with noise. To bridge this gap, we employ a pool-smoothing strategy to construct an FPC-based test statistic, allowing the number of estimated eigenfunctions to grow with the sample size. This yields a consistently nonparametric test, while the challenge arises from the concurrence of diverging truncation and discretized observations. Facilitated by advancing perturbation bounds of estimated eigenfunctions, we establish that the asymptotic null distribution remains valid across permissable truncation levels. Moreover, when the sampling frequency (i.e., the number of measurements per subject) reaches certain magnitude of sample size, the test behaves as if the functions were fully observed. This phase transition phenomenon differs from the well-known result of the pooling mean/covariance estimation, reflecting the elevated difficulty in covariance test due to eigen-decomposition. The numerical studies, including simulations and real data examples, yield favorable performance compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a covariance test for functional data observed discretely and with additive noise. It introduces a pool-smoothing estimator to construct an FPC-based test statistic that permits the truncation level K_n to diverge with sample size n. Using perturbation bounds on the estimated eigenfunctions, the authors establish that the asymptotic null distribution of the test remains valid over a range of permissible truncation levels. They further identify a phase-transition threshold on the per-curve sampling frequency m_n (relative to n) above which the test statistic behaves as if the trajectories were fully observed, a phenomenon distinct from the rates known for mean and covariance estimation. Numerical simulations and real-data examples are reported to illustrate performance.

Significance. If the perturbation analysis rigorously controls the additional error terms arising from the interaction of diverging K_n, the pool-smoothing operator, and the discrete noisy grid, the result would supply both a practical testing procedure and a clear guideline on required sampling density for covariance testing. The phase-transition finding is noteworthy because it underscores that testing imposes stricter requirements on m_n than estimation does, owing to the eigen-decomposition step. The allowance for growing K_n renders the procedure nonparametric and consistent, addressing a common practical limitation of existing fully-observed methods.

major comments (2)
  1. [Abstract / theoretical results] Abstract and theoretical development: The central justification that 'advancing perturbation bounds of estimated eigenfunctions' suffice to keep the discretization-plus-truncation error o_p(1) uniformly over permissible K_n is invoked to validate the asymptotic null distribution, yet the manuscript provides neither the explicit form of these bounds nor the additional terms that would arise from the composition of the pool-smoothing operator with the measurement grid and the eigen-decomposition. Without a displayed rate that explicitly accounts for this interaction when K_n grows faster than the discretization permits, the claim that the null distribution remains valid cannot be verified from the given argument.
  2. [Theoretical results] Theoretical results section: The phase-transition statement—that the test behaves as if functions were fully observed once m_n reaches a certain magnitude of n—is presented as differing from the well-known pooling rates for mean/covariance estimation. However, the derivation does not isolate the extra variability contributed by the eigen-decomposition step, leaving open whether the stated threshold is sharp or merely sufficient under stronger smoothness assumptions than those needed for estimation alone.
minor comments (2)
  1. [Abstract] The abstract would be clearer if it stated the precise order of m_n relative to n that triggers the phase transition, rather than the phrase 'certain magnitude of sample size.'
  2. Notation for the pool-smoothing operator and the permissible range of K_n should be introduced earlier and used consistently in the statements of the main theorems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review of our manuscript. The comments highlight important aspects of our theoretical development that we will clarify in the revision. Below we address each major comment point by point.

read point-by-point responses
  1. Referee: [Abstract / theoretical results] Abstract and theoretical development: The central justification that 'advancing perturbation bounds of estimated eigenfunctions' suffice to keep the discretization-plus-truncation error o_p(1) uniformly over permissible K_n is invoked to validate the asymptotic null distribution, yet the manuscript provides neither the explicit form of these bounds nor the additional terms that would arise from the composition of the pool-smoothing operator with the measurement grid and the eigen-decomposition. Without a displayed rate that explicitly accounts for this interaction when K_n grows faster than the discretization permits, the claim that the null distribution remains valid cannot be verified from the given argument.

    Authors: We appreciate this observation. The perturbation bounds are developed in the supplementary material to control the errors from discretization, noise, and the pool-smoothing operator, ensuring the total error is o_p(1) for K_n in the permissible range. However, we acknowledge that the main text does not explicitly present the composed rates or all interaction terms. In the revised version, we will add a new subsection or appendix excerpt in the theoretical results section that displays these explicit bounds and rates, making the verification of the asymptotic validity straightforward. This addresses the concern directly. revision: yes

  2. Referee: [Theoretical results] Theoretical results section: The phase-transition statement—that the test behaves as if functions were fully observed once m_n reaches a certain magnitude of n—is presented as differing from the well-known pooling rates for mean/covariance estimation. However, the derivation does not isolate the extra variability contributed by the eigen-decomposition step, leaving open whether the stated threshold is sharp or merely sufficient under stronger smoothness assumptions than those needed for estimation alone.

    Authors: We agree that a more explicit isolation of the eigen-decomposition's contribution would be beneficial. Our derivation relies on the perturbation bounds to show that the threshold on m_n suffices for the test statistic to match the fully observed case under the paper's assumptions. We do not assert that this threshold is the minimal possible or sharp without additional assumptions. In the revision, we will expand the discussion in the theoretical results to include a comparison that highlights the extra terms from eigen-decomposition and note the smoothness conditions. This will clarify that the phase transition reflects the increased difficulty due to the eigen-decomposition step compared to pure estimation. revision: partial

Circularity Check

0 steps flagged

No circularity: asymptotic validity and phase transition derived from independent perturbation analysis

full rationale

The paper constructs an FPC-based test statistic via pool-smoothing for discretely observed noisy trajectories, then invokes advancing perturbation bounds on estimated eigenfunctions to control the error from concurrent K_n → ∞ truncation and discretization. This yields the claim that the asymptotic null distribution remains valid across permissible truncation levels and that a phase-transition threshold on sampling frequency m_n exists such that the test behaves as if functions were fully observed. These steps are presented as consequences of the error-control analysis rather than tautological restatements of fitted inputs or self-citations. No equation reduces the target null distribution or phase-transition result to a parameter fit by construction, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via self-citation. The derivation chain is therefore self-contained against external mathematical benchmarks on eigenfunction perturbation and functional central limit theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard perturbation theory for eigenfunctions under discretization and on the validity of the pool-smoothing estimator; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Perturbation bounds for estimated eigenfunctions hold under discrete noisy observations and diverging truncation.
    Invoked to establish that the asymptotic null distribution remains valid.

pith-pipeline@v0.9.0 · 5702 in / 1281 out tokens · 55850 ms · 2026-05-19T06:34:08.175883+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Testing Equality b etween Several Pop- ulations Covariance Operators,

    Boente, G., Rodriguez, D., and Sued, M. (2018), “Testing Equality b etween Several Pop- ulations Covariance Operators,” Annals of the Institute of Statistical Mathematics , 70, 919–950

  2. [2]

    Prediction in Functional Linear Regre ssion,

    Cai, T. T. and Hall, P. (2006), “Prediction in Functional Linear Regre ssion,” The Annals of Statistics, 34, 2159–2179

  3. [3]

    Optimal Estimation of the Mean Fun ction Based on Discretely Sampled Functional Data: Phase Transition,

    Cai, T. T. and Yuan, M. (2011), “Optimal Estimation of the Mean Fun ction Based on Discretely Sampled Functional Data: Phase Transition,” The Annals of Statistics , 39, 2330–2355. — (2012), “Minimax and Adaptive Prediction for Functional Linear Re gression,” Journal of the American Statistical Association , 107, 1201–1216

  4. [4]

    Optimal Bayes Classifie rs for Functional Data and Density Ratios,

    Dai, X. T., M¨ uller, H. G., and Yao, F. (2017), “Optimal Bayes Classifie rs for Functional Data and Density Ratios,” Biometrika, 104, 545–560

  5. [5]

    Asymptotic Theor y for the Principal Component Analysis of a Vector Random Function: Some Applications to Statistical Inference,

    Dauxois, J., Pousse, A., and Romain, Y. (1982), “Asymptotic Theor y for the Principal Component Analysis of a Vector Random Function: Some Applications to Statistical Inference,” Journal of Multivariate Analysis , 12, 136–154

  6. [6]

    Estimation in Functio nal Regression for General Exponential Families,

    Dou, W. W., Pollard, D., and Zhou, H. H. (2012), “Estimation in Functio nal Regression for General Exponential Families,” The Annals of Statistics , 40, 2421–2451. 32

  7. [7]

    False Discovery Rate C ontrol Under Gen- eral Dependence By Symmetrized Data Aggregation,

    Du, L., Guo, X., Sun, W., and Zou, C. (2023), “False Discovery Rate C ontrol Under Gen- eral Dependence By Symmetrized Data Aggregation,” Journal of the American Statistical Association, 118, 607–621

  8. [8]

    Test of Significance When Data ar e Curves,

    Fan, J. Q. and Lin, S. K. (1998), “Test of Significance When Data ar e Curves,” Journal of the American Statistical Association , 93, 1007–1021

  9. [9]

    Testing the Equality of Covariance Operators in Functional Samples,

    Fremdt, S., Steinebach, J., Horv´ ath, L., and Kokoszka, P. (2013 ), “Testing the Equality of Covariance Operators in Functional Samples,” Scandinavian Journal of Statistics , 40, 138–152

  10. [10]

    On Choosing and Bounding Proba bility Metrics,

    Gibbs, A. L. and Su, F. E. (2002), “On Choosing and Bounding Proba bility Metrics,” In- ternational Statistical Review , 70, 419–435

  11. [11]

    Methodology and Convergence R ates for Functional Linear Regression,

    Hall, P. and Horowitz, J. (2007), “Methodology and Convergence R ates for Functional Linear Regression,” The Annals of Statistics , 35, 70–91

  12. [12]

    On Properties of Function al Principal Components Analysis,

    Hall, P. and Hosseini-Nasab, M. (2006), “On Properties of Function al Principal Components Analysis,” Journal of the Royal Statistical Society. Series B. Statist ical Methodology, 68, 109–126

  13. [13]

    Properties of Principal Component Methods for Functional and Longitudinal Data Analysis,

    Hall, P., M¨ uller, H. G., and Wang, J. L. (2006), “Properties of Principal Component Methods for Functional and Longitudinal Data Analysis,” The Annals of Statistics , 34, 1493–1517. Horv´ ath, L., Kokoszka, P., and Reeder, R. (2013), “Estimation o f the Mean of Functional Time Series and a Two Sample Problem,” Journal of the Royal Statistical Society. S...

  14. [14]

    and Eubank, R

    Hsing, T. and Eubank, R. L. (2015), Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators , Wiley, West Sussex

  15. [15]

    Partially Funct ional Linear Regression in High Dimensions,

    Kong, D., Xue, K., Yao, F., and Zhang, H. H. (2016), “Partially Funct ional Linear Regression in High Dimensions,” Biometrika, 103, 147–159

  16. [16]

    Inferential procedures for partially observ ed functional data,

    Kraus, D. (2019), “Inferential procedures for partially observ ed functional data,” Journal of Multivariate Analysis , 173, 583–603

  17. [17]

    Dispersion operators a nd resistant second-order functional data analysis,

    Kraus, D. and Panaretos, V. M. (2012), “Dispersion operators a nd resistant second-order functional data analysis,” Biometrika, 99, 813–832

  18. [18]

    On the Supremum of a Gau ssian Processes,

    Landau, H. J. and Shepp, L. A. (1970), “On the Supremum of a Gau ssian Processes,” Sankhy¯ a Ser. A, 32, 369–378

  19. [19]

    Uniform Convergence Rates for Nonp arametric Regression and Principal Component Analysis in Functional/Longitudinal Data,

    Li, Y. and Hsing, T. (2010), “Uniform Convergence Rates for Nonp arametric Regression and Principal Component Analysis in Functional/Longitudinal Data,” The Annals of Statis- tics, 38, 3321–3351

  20. [20]

    P-values fo r High-dimensional Re- gression,

    Meinshausen, N., Meier, L., and B¨ uhlmann, P. (2009), “P-values fo r High-dimensional Re- gression,” Journal of the American Statistical Association , 104, 1671–1681. O’Donnell, L. J. and Westin, C.-F. (2011), “An Introduction to Diffus ion Tensor Image Analysis,” Neurosurgery Clinics of North America , 22, 185–196, functional Imaging

  21. [21]

    Secon d-order Comparison of Gaussian Random Functions and the Geometry of DNA Minicircles,

    Panaretos, V. M., Kraus, D., and Maddocks, J. H. (2010), “Secon d-order Comparison of Gaussian Random Functions and the Geometry of DNA Minicircles,” Journal of the Amer- ican Statistical Association , 105, 670–682

  22. [22]

    Bootstrap-based Te sting of Equality of Mean 34 Functions or Equality of Covariance Operators for Functional Dat a,

    Paparoditis, E. and Sapatinas, T. (2016), “Bootstrap-based Te sting of Equality of Mean 34 Functions or Equality of Covariance Operators for Functional Dat a,” Biometrika, 103, 727–733

  23. [23]

    Distan ces and Inference for Covariance Operators,

    Pigoli, D., Aston, J. A., Dryden, I. L., and Secchi, P. (2014), “Distan ces and Inference for Covariance Operators,” Biometrika, 101, 409–422

  24. [24]

    and Silverman, B

    Ramsay, J. and Silverman, B. (2005), Functional Data Analysis , Springer-Verlag, New York, 2nd ed

  25. [25]

    Multiple Data Splitting for Tes ting,

    Romano, J. P. and Diciccio, C. (2019), “Multiple Data Splitting for Tes ting,” Tech. Rep. 2019-03, Department of Statistics, Stanford University

  26. [26]

    High Dimensional Variable S election,

    Wasserman, L. and Roeder, K. (2009), “High Dimensional Variable S election,” The Annals of Statistics , 37, 2178–2201

  27. [27]

    Statistical Inferences for Fun ctional Data,

    Zhang, J. and Chen, J. (2007), “Statistical Inferences for Fun ctional Data,” The Annals of Statistics, 35, 1052–1079

  28. [28]

    From Sparse to Dense Functio nal Data and Beyond,

    Zhang, X. and Wang, J. L. (2016), “From Sparse to Dense Functio nal Data and Beyond,” The Annals of Statistics , 44, 2281–2321

  29. [29]

    Theory of Functional Princip al Components Analysis for Discretely Observed Data,

    Zhou, H., Wei, D., and Yao, F. (2022), “Theory of Functional Princip al Components Analysis for Discretely Observed Data,” arXiv:2209.08768v4

  30. [30]

    Consistent Selection of the Nu mber of Change-points via Sample-splitting,

    Zou, C., Wang, G., and Li, R. (2020), “Consistent Selection of the Nu mber of Change-points via Sample-splitting,” The Annals of Statistics , 48, 413–439. 35