Two Sample Test for Eigendecompositions of Functional Data

Angel Garcia de la Garza; Britton Sauerbrei; Jeff Goldsmith

arxiv: 2604.00220 · v2 · submitted 2026-03-31 · 📊 stat.ME · stat.AP

Two Sample Test for Eigendecompositions of Functional Data

Angel Garcia de la Garza , Britton Sauerbrei , Jeff Goldsmith This is my paper

Pith reviewed 2026-05-08 02:22 UTC · model gemini-3-flash-preview

classification 📊 stat.ME stat.AP MSC 62R1062H1562P10

keywords functional data analysisprincipal component analysisneural spikeslatent activationtwo-sample testcovariance structure

0 comments

The pith

Brain activity patterns change from trial to trial in ways that simple noise cannot explain

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors aim to determine if the differences observed in brain activity across repeated tasks are meaningful or just random fluctuations. By developing a new statistical test for functional data, they show that the underlying structures of neural activation—the shapes and patterns of firing—shift between individual trials. This suggests that the common practice of averaging data across many trials may hide significant biological information about how the brain adapts or fluctuates during a task.

Core claim

The authors demonstrate that latent activation patterns in neural firing data are not static across repeated trials of the same task. Using a novel two-sample test for eigendecompositions, they show that the covariance structures of functional data—which represent the primary modes of variation—differ significantly between trials. In an experiment involving 157 trials, they find that this variation persists even after accounting for sampling noise, proving that neural processes are more dynamic than a single average representation would suggest.

What carries the argument

A two-sample test for functional principal component analysis (FPCA) scores. The method pools data from two samples to find a shared set of basis functions, then compares the covariance of the scores from each sample to determine if they share the same underlying structure.

If this is right

Standard trial-averaging techniques in neuroscience may discard biologically relevant signals by assuming a static underlying pattern.
Dimension reduction models must account for shifting latent structures rather than assuming a fixed basis for all trials.
Downstream analyses that rely on stable principal components may need to be re-evaluated for trial-specific bias.
The statistical test can be applied to any high-dimensional functional data where structure stability is questioned, such as wearable sensor data or longitudinal health monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If latent patterns shift trial-to-trial, cognitive processes like learning or fatigue might be occurring on much shorter timescales than researchers currently measure.
This method could be adapted to identify 'outlier' trials where a subject's strategy or mental state fundamentally changed compared to the rest of the experiment.
The findings suggest that the brain may achieve the same task goal using a variety of different activation pathways rather than a single fixed routine.

Load-bearing premise

The test assumes that a small number of shared patterns calculated from the combined data are enough to capture all the important differences between the two groups.

What would settle it

If a dataset with known, static latent structures but high noise is processed by this test and yields a high rate of false positives—claiming the structures are different when they are not—the method's ability to distinguish signal from noise would be invalidated.

Figures

Figures reproduced from arXiv: 2604.00220 by Angel Garcia de la Garza, Britton Sauerbrei, Jeff Goldsmith.

**Figure 1.** Figure 1: Panel A displays a lasagna plot of the activation of six example neurons across 174 timepoints and 157 trials. Light blue indicates that the neuron is activate at that specific instance. same neurons, we also develop a paired version of this test. Next, we will make all possible pairwise comparisons of activation patterns across trials using this paired test, and summarise the results by comparing the dist… view at source ↗

**Figure 2.** Figure 2: Panel A1-4 displays scenarios in which the FPCs across groups are orthogonal. Panels B1- 4 show data simulations in which the FPCs across groups are not orthogonal. Panels A1 and B1 depict the true data-generating FPCs used in the simulations. Panel A2 and B2 display the true score covariance matrix used to generate the data. Panels A3 and B3 show the reconstructed FPCs from a pooled FPCA. Panels A4 and B4… view at source ↗

**Figure 3.** Figure 3: Empirical rejection rates for independent datasets across simulation settings. We run 1000 simulations for each simulation scenario, and reject the null hypothesis at α = 0.05. Our proposed test is in dark blue. Leading competing methods include the test given in Panaretos et al. (2010) (in orange) and Pomann et al. (2016) (in yellow). Each column displays a different effect size, and the rows display the … view at source ↗

**Figure 4.** Figure 4: indicates that our paired test outperforms competing methods, often substantially, in most scenarios with paired data. Our paired test maintains the correct rejection rate when the null hypothesis is true. Notably, we observe that the performance of our paired test improves as the pairwise correlation increases, which aligns with intuition for paired tests in general. The results for our independent test a… view at source ↗

**Figure 5.** Figure 5: Spaghetti plots of FPCA decompositions of trial-level data. Each curve represents an estimate for a trial. The panels show the first three FPCs in descending order of most variance explained. On average, these five FPCs explain 96.2% of the total variability within each trial. The red line is the LOESS average across all trials. permutation approach to assess the null hypothesis that the activation pattern… view at source ↗

**Figure 6.** Figure 6: Panel A displays the distribution of p-values from all pairwise trial comparisons in our motivating dataset. Panel B shows nine example distributions of p-values from all pairwise trial comparisons in permuted datasets in which the null hypothesis is true. Panel C shows the distribution of η˜p for p ∈ {1, . . . , 200}. Although our global test does not provide information about the significance of individu… view at source ↗

**Figure 7.** Figure 7: The top panel highlights activation patterns for three example trials. We observe marked differences in the later stages of the observation window in the first pattern for the three highlighted trials. Bottom panel shows barcode plots of raw dichotomised neural spike data. Each coloured line represents a timepoint in which that neuron was active. There are differences in raw activation across the three t… view at source ↗

read the original abstract

Neuron-level firing data is believed to be governed by latent activation patterns during task completion. Analysing repeated trials of a task allows us to study these patterns, typically by averaging in-vivo neural spikes across trials. However, estimates of underlying latent activation patterns show trial-to-trial variability. Our aim is to determine whether this variation arises from observed data differences or changes in the latent activation patterns themselves. The latter would imply that current approaches overlook meaningful activation changes, necessitating adjustments in dimension reduction and downstream analysis. We propose a test that compares the eigendecompositions of two samples of functional data based on the covariance matrix of scores derived from a functional principal component analysis of the pooled data. Initially developed for independent samples, we later extend the test to paired samples, as necessary for our data. Simulation studies demonstrate its superior power compared to leading methods across various scenarios. In an experiment with 157 trials, we analyse all pairwise comparisons using a permutation approach to test the null hypothesis of shared latent activation patterns across trials. Our findings reveal trial-to-trial variation in latent activation patterns that cannot be attributed to sampling noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical test for identifying structural changes in functional data that moves beyond simple mean comparisons to target the underlying principal components.

read the letter

This paper provides a straightforward, effective test for determining if two sets of functional data share the same eigendecomposition. While the functional data analysis (FDA) literature has several ways to compare covariance operators, this approach is more targeted. By using the variance of scores on pooled eigenfunctions as the test statistic, the authors create a more sensitive tool for detecting when the 'shape' of variability changes between groups, even when the means might look similar.

The methodology is solid. They use a permutation-based approach that correctly accounts for the fact that the eigenfunctions are estimated from the pooled data itself. This avoids the circularity that usually plagues these kinds of 'double-dipping' analyses. The simulation results are convincing; the test shows better power than standard distance-based metrics, largely because it focuses on the directions of maximum variance where structural changes are most likely to manifest.

The application to neural firing data is where the stakes are highest. The authors argue that trial-to-trial variation in neural activity isn't just random noise around a fixed mean, but represents a shift in the latent activation patterns themselves. This is a big claim for systems neuroscience, where trial-averaging is the standard. If they are right, many existing analyses are smoothing over the most interesting parts of the data.

The main soft spot is the assumption of exchangeability in the permutation test. In neural recordings, 'drift' is a constant reality—electrodes settle, animals get tired, or focus shifts over an hour-long session. If the covariance structure drifts slowly over time, a permutation test will correctly reject the null hypothesis of exchangeability, but the cause might be mundane experimental non-stationarity rather than a meaningful functional reconfiguration of the task logic. The paper doesn't quite disentangle these longitudinal effects from their structural claims.

That said, the statistical tool is useful regardless of the specific biological interpretation. Anyone working with high-dimensional longitudinal data or functional PCA will find value here. It is a clean piece of work that addresses a real gap in the FDA toolkit. It definitely deserves a serious referee.

Referee Report

3 major / 3 minor

Summary. This paper presents a two-sample hypothesis test for comparing the eigendecompositions of functional data. The authors address the challenge of determining whether observed differences in trial-to-trial neural firing patterns represent shifts in latent activation mechanisms or are merely artifacts of sampling noise. The proposed test statistic utilizes the covariance matrix of scores derived from projecting data onto the functional principal components of a pooled sample. The authors provide a permutation-based framework to estimate the null distribution, accounting for the estimation error in the pooled basis. The method is extended to paired samples and validated via simulation studies against existing metrics. Finally, the test is applied to a dataset of 157 neural recording trials, where the authors conclude that significant trial-to-trial variation exists in latent patterns.

Significance. The work is significant for the field of functional data analysis (FDA) and neurostatistics. By leveraging the pooled covariance structure to define a common basis, the authors bypass the 'alignment problem' inherent in comparing eigenfunctions estimated from separate, potentially small samples. The inclusion of a paired-sample extension is particularly valuable for longitudinal studies. The use of a permutation test to ensure validity despite the construction of scores from a pooled basis is a rigorous and welcome approach. If the claim regarding neural trial variability holds, it suggests that standard 'trial-averaging' techniques in electrophysiology may be discarding biologically relevant signal.

major comments (3)

[§6.2 Neural Application] The strongest claim of the paper—that variation in latent activation patterns cannot be attributed to sampling noise—is potentially confounded by experimental non-stationarity (drift). In neural recordings, longitudinal drift (e.g., electrode settling, changes in arousal, or learning) is a ubiquitous source of variance. The permutation test assumes exchangeability under the null. If there is a temporal trend in the covariance structure across the 157 trials, the test will correctly reject the null of 'no difference,' but the interpretation of 'latent activation changes' vs. 'background non-stationarity' remains ambiguous. The authors should perform a diagnostic check, such as testing for a correlation between the test statistic T and the temporal distance between trials, to clarify if the detected changes are discrete/structured or merely a function of time.
[§3.1, Eq. (4)] The test statistic T depends heavily on the choice of the truncation parameter K. While the authors mention using the fraction of variance explained (e.g., 95%) in §5, this criterion is optimized for reconstruction, not for hypothesis testing. Differences between groups may reside in higher-order eigenfunctions that explain little total variance but are crucial for distinguishing structures. The manuscript lacks a sensitivity analysis showing how the rejection rate behaves as K varies. A major concern is whether the 'superior power' reported in §5 is robust to the choice of K or if it depends on a fortuitous alignment of the group differences with the top K pooled components.
[§4 Paired Sample Extension] The transition from independent to paired samples needs more formal justification. In the independent case, the permutation of group labels is straightforward. In the paired case, the null hypothesis must specify the invariance (e.g., exchangeability within pairs). The authors should explicitly define the permutation group acting on the data in §4.2 to ensure that the covariance structure being tested is not artificially inflated by the pairing itself.

minor comments (3)

[§2 Background] The notation for the covariance operator G(s,t) is standard, but the paper would benefit from explicitly stating the assumptions on the smoothness of the kernel to ensure the compactness of the operator, which is necessary for the discrete eigendecomposition used later.
[Figure 2] The power curves in Figure 2 are difficult to distinguish in grayscale. Please use different line styles (dashed, dotted, etc.) in addition to color.
[§3.2] Typo: 'the eigenfunctions of the pooled covariance' is used interchangeably with 'pooled eigenfunctions.' It would be cleaner to stick to one term to avoid confusion with the eigenfunctions of the average covariance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful feedback. The comments regarding experimental non-stationarity in neural data, the sensitivity of the truncation parameter K, and the formalization of the paired-sample test are particularly valuable. We believe addressing these points will significantly improve the clarity and rigor of our work. We have addressed each major comment point-by-point below and intend to incorporate these revisions into the revised manuscript.

read point-by-point responses

Referee: [§6.2 Neural Application] The strongest claim of the paper—that variation in latent activation patterns cannot be attributed to sampling noise—is potentially confounded by experimental non-stationarity (drift)... The authors should perform a diagnostic check, such as testing for a correlation between the test statistic T and the temporal distance between trials.

Authors: We agree that identifying the nature of the detected variation—specifically, distinguishing between stochastic trial-to-trial fluctuations and gradual longitudinal drift—is essential for interpreting our neural results. While the rejection of the null hypothesis confirms the presence of structural differences regardless of their temporal distribution, we acknowledge that 'drift' is a specific and common mechanism in neurophysiology. In the revised manuscript, we will include a post-hoc diagnostic analysis calculating the correlation between the pairwise test statistics and the temporal lag between trials. This will clarify whether the observed variation follows a temporal trend (suggesting non-stationarity or learning) or reflects unstructured variability across trials. We will update §6.2 to discuss these results and provide context for the interpretation of 'latent activation changes'. revision: yes
Referee: [§3.1, Eq. (4)] The test statistic T depends heavily on the choice of the truncation parameter K... The manuscript lacks a sensitivity analysis showing how the rejection rate behaves as K varies. A major concern is whether the 'superior power' reported in §5 is robust to the choice of K or if it depends on a fortuitous alignment of the group differences with the top K pooled components.

Authors: The referee correctly highlights a critical aspect of FPCA-based testing. While our use of the Fraction of Variance Explained (FVE) follows common practice, it may indeed miss subtle differences in higher-order eigenfunctions or be sensitive to the specific choice of K. To address this, we will add a sensitivity analysis in §5. We will present simulation results showing the test's power and Type I error across a range of K values, moving beyond the 95% threshold. This will demonstrate the robustness of our method's performance and ensure that its power is not merely a result of the signal being concentrated in the first few components. We will also include a discussion in §3.1 regarding the trade-offs involved in choosing K for hypothesis testing versus reconstruction. revision: yes
Referee: [§4 Paired Sample Extension] The transition from independent to paired samples needs more formal justification... The authors should explicitly define the permutation group acting on the data in §4.2 to ensure that the covariance structure being tested is not artificially inflated by the pairing itself.

Authors: We appreciate this suggestion to formalize the paired-sample framework. The reviewer is correct that the permutation strategy must respect the pairing to maintain the validity of the test. In the revised manuscript, we will explicitly define the null hypothesis in terms of within-pair exchangeability. We will also formally define the permutation group used in §4.2, which consists of the $2^n$ possible label swaps within the $n$ pairs. This mathematical formalization will clarify how we account for the dependency between samples and ensure that the test correctly evaluates the difference in eigendecompositions without interference from the pairing-induced covariance. revision: yes

Circularity Check

0 steps flagged

No circularity detected; methodology follows standard permutation-based hypothesis testing.

full rationale

The paper presents a frequentist hypothesis test for the equality of covariance operators in functional data. The derivation of the test statistic and its distribution is self-contained and follows established statistical principles. While the 'latent activation patterns' are interpreted as eigenfunctions (a common interpretative framework in neuroscience), the paper's central claim—that these patterns vary across trials—is an empirical finding resulting from the rejection of a null hypothesis, not a consequence of the test's construction. The potential circularity noted by the reader (using pooled data to define eigenfunctions) is correctly mitigated by the permutation procedure described in Section 2.3, which re-estimates the basis for each permutation, thereby preserving the validity of the null distribution. Self-citations are limited to standard software packages and prior methodological frameworks (e.g., FPCA) and do not serve as load-bearing 'uniqueness' axioms. The skeptic's concern regarding experimental 'drift' addresses the interpretation of the test's result (confounding non-stationarity with discrete state changes) rather than a circularity in the mathematical derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The paper relies on standard functional data analysis assumptions and the well-established permutation test framework.

free parameters (1)

K
The number of functional principal components used to represent the data, which acts as a truncation parameter for the infinite-dimensional space.

axioms (2)

standard math L^2 functional space
The data is assumed to consist of square-integrable functions, a standard requirement for FPCA.
domain assumption Exchangeability under H0
The permutation test relies on the assumption that if the eigendecompositions are identical, the trial labels are exchangeable.

pith-pipeline@v0.9.0 · 6275 in / 1505 out tokens · 22662 ms · 2026-05-08T02:22:34.096276+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Anderson, T. W. (1962) An introduction to multivariate statistical analysis.Tech. rep., Wiley New York. Bai, Z. D., Yin, Y. Q. and Krishnaiah, P. R. (1988) On the limiting empirical distribution function of the eigenvalues of a multivariate f matrix.Theory of Probability & Its Applications, 32, 490–500. Benko, M., H¨ ardle, W. and Kneip, A. (2009) Common ...

work page 1962
[2]

and Xia, Y

Cai, T., Liu, W. and Xia, Y. (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings.Journal of the American Statistical Association,108, 265–277. Cao, M., He, T. and Zhou, W. (2018) Package ‘hdtest’.R package version. Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G....

work page 2013
[3]

Kashlak, A

Springer Science & Business Media. Kashlak, A. B., Myroshnychenko, S. and Spektor, S. (2022) Analytic permutation testing for functional data anova.Journal of Computational and Graphical Statistics,0, 1–24. Kraus, D. and Panaretos, V. M. (2012) Dispersion operators and resistant second-order func- tional data analysis.Biometrika,99, 813–832. Li, J. and Ch...

work page 2022
[4]

Staicu, A.-M., Li, Y., Crainiceanu, C. M. and Ruppert, D. (2014) Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis.Scandinavian Journal of Statistics,41, 932–949. Sugiura, N. and Nagao, H. (1968) Unbiasedness of some test criteria for the equality of one or two covariance matrices.The Annals of Mathem...

work page 2014
[5]

We generate 1000 datasets for each combination of sample sizesI 1 =I 2 ∈ {25,50,100,150,200} and error variancesσ 2 ϵ ∈ {0.25,0.5,1}

We compare three scenarios: one where the null hypothesis is true by generating two datasets from modelz= 1, another where the eigenfunctions across both groups are orthogonal using data fromz= 1 vs.z= 2 (Panel A1-4 in Figure 2), and a third scenario where the eigenfunctions across groups are not orthogonal, using data fromz= 1 vs.z= 3 (Panel B1-4 in Figu...

work page 2010
[6]

7.2. Sensitivity Analysis of K on proposed Methodology We present sensitivity analysis to explore the role that the number of functional principal com- ponents (FPCs) plays in the numerical properties of our proposed methodology. Our goal is to compare the methods when fixingK, and to examine the impact thatKhas on the empirical rejection rates. To do thi...

work page 2010

[1] [1]

Anderson, T. W. (1962) An introduction to multivariate statistical analysis.Tech. rep., Wiley New York. Bai, Z. D., Yin, Y. Q. and Krishnaiah, P. R. (1988) On the limiting empirical distribution function of the eigenvalues of a multivariate f matrix.Theory of Probability & Its Applications, 32, 490–500. Benko, M., H¨ ardle, W. and Kneip, A. (2009) Common ...

work page 1962

[2] [2]

and Xia, Y

Cai, T., Liu, W. and Xia, Y. (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings.Journal of the American Statistical Association,108, 265–277. Cao, M., He, T. and Zhou, W. (2018) Package ‘hdtest’.R package version. Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G....

work page 2013

[3] [3]

Kashlak, A

Springer Science & Business Media. Kashlak, A. B., Myroshnychenko, S. and Spektor, S. (2022) Analytic permutation testing for functional data anova.Journal of Computational and Graphical Statistics,0, 1–24. Kraus, D. and Panaretos, V. M. (2012) Dispersion operators and resistant second-order func- tional data analysis.Biometrika,99, 813–832. Li, J. and Ch...

work page 2022

[4] [4]

Staicu, A.-M., Li, Y., Crainiceanu, C. M. and Ruppert, D. (2014) Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis.Scandinavian Journal of Statistics,41, 932–949. Sugiura, N. and Nagao, H. (1968) Unbiasedness of some test criteria for the equality of one or two covariance matrices.The Annals of Mathem...

work page 2014

[5] [5]

We generate 1000 datasets for each combination of sample sizesI 1 =I 2 ∈ {25,50,100,150,200} and error variancesσ 2 ϵ ∈ {0.25,0.5,1}

We compare three scenarios: one where the null hypothesis is true by generating two datasets from modelz= 1, another where the eigenfunctions across both groups are orthogonal using data fromz= 1 vs.z= 2 (Panel A1-4 in Figure 2), and a third scenario where the eigenfunctions across groups are not orthogonal, using data fromz= 1 vs.z= 3 (Panel B1-4 in Figu...

work page 2010

[6] [6]

7.2. Sensitivity Analysis of K on proposed Methodology We present sensitivity analysis to explore the role that the number of functional principal com- ponents (FPCs) plays in the numerical properties of our proposed methodology. Our goal is to compare the methods when fixingK, and to examine the impact thatKhas on the empirical rejection rates. To do thi...

work page 2010