Two Sample Test for Eigendecompositions of Functional Data
Pith reviewed 2026-05-08 02:22 UTC · model gemini-3-flash-preview
The pith
Brain activity patterns change from trial to trial in ways that simple noise cannot explain
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate that latent activation patterns in neural firing data are not static across repeated trials of the same task. Using a novel two-sample test for eigendecompositions, they show that the covariance structures of functional data—which represent the primary modes of variation—differ significantly between trials. In an experiment involving 157 trials, they find that this variation persists even after accounting for sampling noise, proving that neural processes are more dynamic than a single average representation would suggest.
What carries the argument
A two-sample test for functional principal component analysis (FPCA) scores. The method pools data from two samples to find a shared set of basis functions, then compares the covariance of the scores from each sample to determine if they share the same underlying structure.
If this is right
- Standard trial-averaging techniques in neuroscience may discard biologically relevant signals by assuming a static underlying pattern.
- Dimension reduction models must account for shifting latent structures rather than assuming a fixed basis for all trials.
- Downstream analyses that rely on stable principal components may need to be re-evaluated for trial-specific bias.
- The statistical test can be applied to any high-dimensional functional data where structure stability is questioned, such as wearable sensor data or longitudinal health monitoring.
Where Pith is reading between the lines
- If latent patterns shift trial-to-trial, cognitive processes like learning or fatigue might be occurring on much shorter timescales than researchers currently measure.
- This method could be adapted to identify 'outlier' trials where a subject's strategy or mental state fundamentally changed compared to the rest of the experiment.
- The findings suggest that the brain may achieve the same task goal using a variety of different activation pathways rather than a single fixed routine.
Load-bearing premise
The test assumes that a small number of shared patterns calculated from the combined data are enough to capture all the important differences between the two groups.
What would settle it
If a dataset with known, static latent structures but high noise is processed by this test and yields a high rate of false positives—claiming the structures are different when they are not—the method's ability to distinguish signal from noise would be invalidated.
Figures
read the original abstract
Neuron-level firing data is believed to be governed by latent activation patterns during task completion. Analysing repeated trials of a task allows us to study these patterns, typically by averaging in-vivo neural spikes across trials. However, estimates of underlying latent activation patterns show trial-to-trial variability. Our aim is to determine whether this variation arises from observed data differences or changes in the latent activation patterns themselves. The latter would imply that current approaches overlook meaningful activation changes, necessitating adjustments in dimension reduction and downstream analysis. We propose a test that compares the eigendecompositions of two samples of functional data based on the covariance matrix of scores derived from a functional principal component analysis of the pooled data. Initially developed for independent samples, we later extend the test to paired samples, as necessary for our data. Simulation studies demonstrate its superior power compared to leading methods across various scenarios. In an experiment with 157 trials, we analyse all pairwise comparisons using a permutation approach to test the null hypothesis of shared latent activation patterns across trials. Our findings reveal trial-to-trial variation in latent activation patterns that cannot be attributed to sampling noise.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper presents a two-sample hypothesis test for comparing the eigendecompositions of functional data. The authors address the challenge of determining whether observed differences in trial-to-trial neural firing patterns represent shifts in latent activation mechanisms or are merely artifacts of sampling noise. The proposed test statistic utilizes the covariance matrix of scores derived from projecting data onto the functional principal components of a pooled sample. The authors provide a permutation-based framework to estimate the null distribution, accounting for the estimation error in the pooled basis. The method is extended to paired samples and validated via simulation studies against existing metrics. Finally, the test is applied to a dataset of 157 neural recording trials, where the authors conclude that significant trial-to-trial variation exists in latent patterns.
Significance. The work is significant for the field of functional data analysis (FDA) and neurostatistics. By leveraging the pooled covariance structure to define a common basis, the authors bypass the 'alignment problem' inherent in comparing eigenfunctions estimated from separate, potentially small samples. The inclusion of a paired-sample extension is particularly valuable for longitudinal studies. The use of a permutation test to ensure validity despite the construction of scores from a pooled basis is a rigorous and welcome approach. If the claim regarding neural trial variability holds, it suggests that standard 'trial-averaging' techniques in electrophysiology may be discarding biologically relevant signal.
major comments (3)
- [§6.2 Neural Application] The strongest claim of the paper—that variation in latent activation patterns cannot be attributed to sampling noise—is potentially confounded by experimental non-stationarity (drift). In neural recordings, longitudinal drift (e.g., electrode settling, changes in arousal, or learning) is a ubiquitous source of variance. The permutation test assumes exchangeability under the null. If there is a temporal trend in the covariance structure across the 157 trials, the test will correctly reject the null of 'no difference,' but the interpretation of 'latent activation changes' vs. 'background non-stationarity' remains ambiguous. The authors should perform a diagnostic check, such as testing for a correlation between the test statistic T and the temporal distance between trials, to clarify if the detected changes are discrete/structured or merely a function of time.
- [§3.1, Eq. (4)] The test statistic T depends heavily on the choice of the truncation parameter K. While the authors mention using the fraction of variance explained (e.g., 95%) in §5, this criterion is optimized for reconstruction, not for hypothesis testing. Differences between groups may reside in higher-order eigenfunctions that explain little total variance but are crucial for distinguishing structures. The manuscript lacks a sensitivity analysis showing how the rejection rate behaves as K varies. A major concern is whether the 'superior power' reported in §5 is robust to the choice of K or if it depends on a fortuitous alignment of the group differences with the top K pooled components.
- [§4 Paired Sample Extension] The transition from independent to paired samples needs more formal justification. In the independent case, the permutation of group labels is straightforward. In the paired case, the null hypothesis must specify the invariance (e.g., exchangeability within pairs). The authors should explicitly define the permutation group acting on the data in §4.2 to ensure that the covariance structure being tested is not artificially inflated by the pairing itself.
minor comments (3)
- [§2 Background] The notation for the covariance operator G(s,t) is standard, but the paper would benefit from explicitly stating the assumptions on the smoothness of the kernel to ensure the compactness of the operator, which is necessary for the discrete eigendecomposition used later.
- [Figure 2] The power curves in Figure 2 are difficult to distinguish in grayscale. Please use different line styles (dashed, dotted, etc.) in addition to color.
- [§3.2] Typo: 'the eigenfunctions of the pooled covariance' is used interchangeably with 'pooled eigenfunctions.' It would be cleaner to stick to one term to avoid confusion with the eigenfunctions of the average covariance.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful feedback. The comments regarding experimental non-stationarity in neural data, the sensitivity of the truncation parameter K, and the formalization of the paired-sample test are particularly valuable. We believe addressing these points will significantly improve the clarity and rigor of our work. We have addressed each major comment point-by-point below and intend to incorporate these revisions into the revised manuscript.
read point-by-point responses
-
Referee: [§6.2 Neural Application] The strongest claim of the paper—that variation in latent activation patterns cannot be attributed to sampling noise—is potentially confounded by experimental non-stationarity (drift)... The authors should perform a diagnostic check, such as testing for a correlation between the test statistic T and the temporal distance between trials.
Authors: We agree that identifying the nature of the detected variation—specifically, distinguishing between stochastic trial-to-trial fluctuations and gradual longitudinal drift—is essential for interpreting our neural results. While the rejection of the null hypothesis confirms the presence of structural differences regardless of their temporal distribution, we acknowledge that 'drift' is a specific and common mechanism in neurophysiology. In the revised manuscript, we will include a post-hoc diagnostic analysis calculating the correlation between the pairwise test statistics and the temporal lag between trials. This will clarify whether the observed variation follows a temporal trend (suggesting non-stationarity or learning) or reflects unstructured variability across trials. We will update §6.2 to discuss these results and provide context for the interpretation of 'latent activation changes'. revision: yes
-
Referee: [§3.1, Eq. (4)] The test statistic T depends heavily on the choice of the truncation parameter K... The manuscript lacks a sensitivity analysis showing how the rejection rate behaves as K varies. A major concern is whether the 'superior power' reported in §5 is robust to the choice of K or if it depends on a fortuitous alignment of the group differences with the top K pooled components.
Authors: The referee correctly highlights a critical aspect of FPCA-based testing. While our use of the Fraction of Variance Explained (FVE) follows common practice, it may indeed miss subtle differences in higher-order eigenfunctions or be sensitive to the specific choice of K. To address this, we will add a sensitivity analysis in §5. We will present simulation results showing the test's power and Type I error across a range of K values, moving beyond the 95% threshold. This will demonstrate the robustness of our method's performance and ensure that its power is not merely a result of the signal being concentrated in the first few components. We will also include a discussion in §3.1 regarding the trade-offs involved in choosing K for hypothesis testing versus reconstruction. revision: yes
-
Referee: [§4 Paired Sample Extension] The transition from independent to paired samples needs more formal justification... The authors should explicitly define the permutation group acting on the data in §4.2 to ensure that the covariance structure being tested is not artificially inflated by the pairing itself.
Authors: We appreciate this suggestion to formalize the paired-sample framework. The reviewer is correct that the permutation strategy must respect the pairing to maintain the validity of the test. In the revised manuscript, we will explicitly define the null hypothesis in terms of within-pair exchangeability. We will also formally define the permutation group used in §4.2, which consists of the $2^n$ possible label swaps within the $n$ pairs. This mathematical formalization will clarify how we account for the dependency between samples and ensure that the test correctly evaluates the difference in eigendecompositions without interference from the pairing-induced covariance. revision: yes
Circularity Check
No circularity detected; methodology follows standard permutation-based hypothesis testing.
full rationale
The paper presents a frequentist hypothesis test for the equality of covariance operators in functional data. The derivation of the test statistic and its distribution is self-contained and follows established statistical principles. While the 'latent activation patterns' are interpreted as eigenfunctions (a common interpretative framework in neuroscience), the paper's central claim—that these patterns vary across trials—is an empirical finding resulting from the rejection of a null hypothesis, not a consequence of the test's construction. The potential circularity noted by the reader (using pooled data to define eigenfunctions) is correctly mitigated by the permutation procedure described in Section 2.3, which re-estimates the basis for each permutation, thereby preserving the validity of the null distribution. Self-citations are limited to standard software packages and prior methodological frameworks (e.g., FPCA) and do not serve as load-bearing 'uniqueness' axioms. The skeptic's concern regarding experimental 'drift' addresses the interpretation of the test's result (confounding non-stationarity with discrete state changes) rather than a circularity in the mathematical derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- K
axioms (2)
- standard math L^2 functional space
- domain assumption Exchangeability under H0
Reference graph
Works this paper leans on
-
[1]
Anderson, T. W. (1962) An introduction to multivariate statistical analysis.Tech. rep., Wiley New York. Bai, Z. D., Yin, Y. Q. and Krishnaiah, P. R. (1988) On the limiting empirical distribution function of the eigenvalues of a multivariate f matrix.Theory of Probability & Its Applications, 32, 490–500. Benko, M., H¨ ardle, W. and Kneip, A. (2009) Common ...
work page 1962
-
[2]
Cai, T., Liu, W. and Xia, Y. (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings.Journal of the American Statistical Association,108, 265–277. Cao, M., He, T. and Zhou, W. (2018) Package ‘hdtest’.R package version. Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G....
work page 2013
-
[3]
Springer Science & Business Media. Kashlak, A. B., Myroshnychenko, S. and Spektor, S. (2022) Analytic permutation testing for functional data anova.Journal of Computational and Graphical Statistics,0, 1–24. Kraus, D. and Panaretos, V. M. (2012) Dispersion operators and resistant second-order func- tional data analysis.Biometrika,99, 813–832. Li, J. and Ch...
work page 2022
-
[4]
Staicu, A.-M., Li, Y., Crainiceanu, C. M. and Ruppert, D. (2014) Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis.Scandinavian Journal of Statistics,41, 932–949. Sugiura, N. and Nagao, H. (1968) Unbiasedness of some test criteria for the equality of one or two covariance matrices.The Annals of Mathem...
work page 2014
-
[5]
We compare three scenarios: one where the null hypothesis is true by generating two datasets from modelz= 1, another where the eigenfunctions across both groups are orthogonal using data fromz= 1 vs.z= 2 (Panel A1-4 in Figure 2), and a third scenario where the eigenfunctions across groups are not orthogonal, using data fromz= 1 vs.z= 3 (Panel B1-4 in Figu...
work page 2010
-
[6]
7.2. Sensitivity Analysis of K on proposed Methodology We present sensitivity analysis to explore the role that the number of functional principal com- ponents (FPCs) plays in the numerical properties of our proposed methodology. Our goal is to compare the methods when fixingK, and to examine the impact thatKhas on the empirical rejection rates. To do thi...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.