Two-Sample Hypothesis Testing for Subspace Equality in Network Data
Pith reviewed 2026-06-28 00:08 UTC · model grok-4.3
The pith
A test for subspace equality between two networks has its centered and scaled statistic converge in distribution to Gaussian when average expected degree grows at least logarithmically in the number of vertices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the Frobenius norm of the difference of the leading subspace projection matrices from two networks, after appropriate centering and scaling, converges in distribution to a Gaussian random variable whenever the average expected degree grows at least logarithmically in the number of vertices. This convergence holds for general subspace equality testing and in particular for stochastic blockmodels and mixed-membership stochastic blockmodels with identical communities but possibly different edge probabilities. The result is obtained via a one-sample limit theorem on the projection difference between empirical and population eigenvectors, which is presented as potentially of
What carries the argument
The test statistic given by the Frobenius norm of the difference between the two leading subspace projection matrices, supported by a limit theorem for the difference between empirical and true eigenvector projections.
If this is right
- The test applies directly to stochastic blockmodels and mixed-membership stochastic blockmodels that share communities but have different edge probabilities.
- Estimators for the asymptotic mean and variance remain consistent when the signal strength satisfies a stricter condition than the basic logarithmic-degree requirement.
- The test possesses nontrivial local power against local alternatives when the networks are sufficiently dense.
- The one-sample eigenvector-projection limit theorem can be applied to single-network inference problems that involve eigenvector differences.
Where Pith is reading between the lines
- The logarithmic-degree threshold suggests the procedure may remain valid in moderately sparse regimes that lie between the dense and extremely sparse settings studied in earlier network-testing literature.
- The same projection-difference machinery could be adapted to test subspace equality in degree-corrected blockmodels or other latent-space network models without altering the core convergence argument.
- Applied researchers could compare networks observed at different times or under different experimental conditions to detect whether community structure has changed while allowing for shifts in overall edge density.
Load-bearing premise
The average expected degree grows at least logarithmically in the number of vertices.
What would settle it
Simulations in which the centered and scaled test statistic deviates from a standard normal distribution when the average expected degree grows exactly logarithmically with the number of vertices would falsify the claimed convergence.
read the original abstract
In many settings one is often interested in determining whether two networks share some joint structural connectivity patterns such as communities. However, while communities may be shared across networks, edge probabilities may differ significantly. Therefore, in this paper we consider testing a general null hypothesis that two networks have the same underlying subspace, which in particular includes the setting that communities are the same for either stochastic blockmodels or mixed-membership stochastic blockmodels (even if edge probabilities are different). We propose a test statistic based on the Frobenius norm of the difference of the leading subspace projection matrices, and we prove that our test statistic, after appropriate centering and scaling, converges in distribution to a Gaussian random variable as long as the average expected degree grows at least logarithmically in the number of vertices. We then provide estimators for the asymptotic mean and variance and show consistency under a stronger signal condition, and we give the local power of our test when the networks are sufficiently dense. Our theoretical results are based on a limit theorem for the projection difference of empirical and true eigenvectors which can also be viewed as the one-sample version of our test statistic, and this result may be of independent interest. We demonstrate our results through numerical simulations and an application to US Flight data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a two-sample test for equality of leading subspaces between two networks (e.g., stochastic block models or mixed-membership variants sharing communities but with possibly different edge probabilities). The test statistic is the Frobenius norm of the difference between the empirical leading subspace projection matrices. The central theoretical contribution is a limit theorem establishing that, after centering and scaling, this statistic converges in distribution to a Gaussian random variable provided the average expected degree satisfies d_n = Ω(log n). The authors supply consistent estimators for the asymptotic mean and variance under a stronger signal regime, derive the local power of the resulting test, and note that the one-sample projection-difference limit theorem may be of independent interest. Results are illustrated via simulations and a US Flight data example.
Significance. If the stated limit theorem and consistency results hold, the work supplies a practical, asymptotically justified procedure for testing structural similarity across networks without requiring identical edge probabilities. The explicit degree-growth condition and the one-sample projection limit theorem constitute clear technical strengths; the approach is non-parametric with respect to the subspace and directly addresses a common applied setting in network analysis.
minor comments (2)
- [Abstract] Abstract: the phrase 'stronger signal condition' for variance estimation is left unspecified; the manuscript should state the precise regime (e.g., a concrete lower bound on d_n relative to n) already in the abstract so that readers can immediately assess applicability.
- [Introduction] The manuscript should include a brief comparison, in the introduction or discussion, of the log n degree condition with existing results on eigenvector perturbation or subspace recovery in the network literature (e.g., results requiring d_n = Ω(log n / log log n) or polynomial growth).
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work, the accurate summary of the main contributions, and the recommendation for minor revision. We are pleased that the referee highlights the practical utility of the non-parametric test, the explicit degree-growth condition, and the independent interest of the one-sample projection limit theorem.
Circularity Check
No circularity; central limit theorem is a direct proof
full rationale
The paper states and proves a new limit theorem for the projection difference of empirical and true eigenvectors (one-sample version of the test statistic) under the explicit condition that average expected degree grows at least logarithmically. No equations reduce the claimed convergence or variance estimators to a fitted quantity by construction, no self-citations are invoked as load-bearing uniqueness results, and no ansatz is smuggled or known result renamed. The derivation chain consists of standard concentration arguments for network eigenvectors followed by explicit centering/scaling, all presented as self-contained results of independent interest. This is the normal case of a theoretical statistics paper whose main claim is an asymptotic statement proved from first principles.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard central limit theorems and concentration results for eigenvectors of random adjacency matrices apply under the stated degree growth condition.
- domain assumption The two networks are generated from SBM or MMSBM with identical leading subspace but possibly different edge probabilities.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:1901.00304 , year=
Normal approximation and confidence region of singular subspaces , author=. arXiv preprint arXiv:1901.00304 , year=
Pith/arXiv arXiv 1901
-
[2]
arXiv preprint arXiv:1811.12593 , year=
Two-sample test of community memberships of weighted stochastic block models , author=. arXiv preprint arXiv:1811.12593 , year=
-
[3]
arXiv preprint arXiv:1910.08460 , year=
On the perturbation series for eigenvalues and eigenprojections , author=. arXiv preprint arXiv:1910.08460 , year=
arXiv 1910
-
[4]
2010 , publisher=
Spectral analysis of large dimensional random matrices , author=. 2010 , publisher=
2010
-
[5]
Airline On-Time Performance Data , year =
-
[6]
2020 , note=
Asymptotic theory of eigenvectors for random matrices with diverging spikes , author=. 2020 , note=
2020
-
[7]
The Annals of Statistics , volume=
Consistency of spectral clustering in stochastic block models , author=. The Annals of Statistics , volume=. 2015 , doi=
2015
-
[8]
Tony and Guo, Zijian , title =
Cai, T. Tony and Guo, Zijian , title =. The Annals of Statistics , volume =. 2018 , doi =
2018
-
[9]
The Annals of Probability , volume=
Sharp nonasymptotic bounds on the norm of random matrices with independent entries , author=. The Annals of Probability , volume=. 2016 , publisher=
2016
-
[10]
arXiv preprint arXiv:2212.05053 , year =
Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels , author =. arXiv preprint arXiv:2212.05053 , year =
-
[11]
2009 , publisher=
Statistical analysis of network data: methods and models , author=. 2009 , publisher=
2009
-
[12]
Oxford University Press , year=
The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership , author=. Oxford University Press , year=
-
[13]
Nature reviews neuroscience , volume=
Complex brain networks: graph theoretical analysis of structural and functional systems , author=. Nature reviews neuroscience , volume=. 2009 , publisher=
2009
-
[14]
Proceedings of the national academy of sciences , volume=
Community structure in social and biological networks , author=. Proceedings of the national academy of sciences , volume=. 2002 , publisher=
2002
-
[15]
Computational Statistics & Data Analysis , volume=
Automatic dimensionality selection from the scree plot via the use of profile likelihood , author=. Computational Statistics & Data Analysis , volume=. 2006 , publisher=
2006
-
[16]
arXiv preprint arXiv:2305.06353 , year=
An overview of asymptotic normality in stochastic blockmodels: Cluster analysis and inference , author=. arXiv preprint arXiv:2305.06353 , year=
-
[17]
Missouri Journal of Mathematical Sciences , volume=
When a matrix and its inverse are nonnegative , author=. Missouri Journal of Mathematical Sciences , volume=. 2014 , publisher=
2014
-
[18]
, month = apr, year =
Chatterjee, Sayak and Saha, Dibyendu and Dan, Soham and Bhattacharya, Bhaswar B. , month = apr, year =. Two-. Proceedings of
-
[19]
A nonparametric two-sample hypothesis testing problem for random graphs , volume =. Bernoulli , author =. 2017 , note =. doi:10.3150/15-BEJ789 , abstract =
-
[20]
Journal of Machine Learning Research , author =
Statistical. Journal of Machine Learning Research , author =. 2018 , pages =
2018
-
[21]
Journal of the American Statistical Association , author =
Optimal. Journal of the American Statistical Association , author =. 2025 , note =. doi:10.1080/01621459.2024.2393471 , abstract =
-
[22]
The Annals of Statistics , author =
Two-. The Annals of Statistics , author =. 2020 , note =
2020
-
[23]
Journal of Computational and Graphical Statistics , author =
Scalable. Journal of Computational and Graphical Statistics , author =. 2025 , note =. doi:10.1080/10618600.2024.2432974 , abstract =
-
[24]
Computational Statistics & Data Analysis , author =
Lost in the shuffle:. Computational Statistics & Data Analysis , author =. 2025 , keywords =. doi:10.1016/j.csda.2024.108091 , abstract =
-
[25]
Multivariate Inference of Network Moments by Subsampling
Qi, Mingyu and Li, Tianxi and Zhou, Wen , month = sep, year =. Multivariate. doi:10.48550/arXiv.2409.01599 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.01599
-
[26]
Journal of the American Statistical Association , author =
Joint. Journal of the American Statistical Association , author =. 2025 , note =. doi:10.1080/01621459.2025.2516201 , abstract =
-
[27]
Journal of the American Statistical Association , author =
Estimating. Journal of the American Statistical Association , author =. 2025 , note =. doi:10.1080/01621459.2024.2404265 , abstract =
-
[28]
Agterberg, Joshua and Tang, Minh and Priebe, Carey , month = dec, year =. Nonparametric. doi:10.48550/arXiv.2012.09828 , abstract =
-
[29]
Agterberg, Joshua and Cape, Joshua , month = jan, year =. An. doi:10.48550/arXiv.2305.06353 , abstract =
-
[30]
and Levina, Elizaveta and Zhu, Ji , month = oct, year =
MacDonald, Peter W. and Levina, Elizaveta and Zhu, Ji , month = oct, year =. Mesoscale two-sample testing for network data , url =. doi:10.48550/arXiv.2410.17046 , abstract =
-
[31]
doi:10.48550/arXiv.2211.00128 , abstract =
Fan, Jianqing and Fan, Yingying and Lv, Jinchi and Yang, Fan , month = oct, year =. doi:10.48550/arXiv.2211.00128 , abstract =
-
[32]
Journal of the American Statistical Association , author =
Higher-. Journal of the American Statistical Association , author =. doi:10.1080/01621459.2025.2520459 , abstract =
-
[33]
IEEE Transactions on Information Theory , author =
Confidence. IEEE Transactions on Information Theory , author =. 2019 , keywords =. doi:10.1109/TIT.2019.2924900 , abstract =
-
[34]
The Annals of Statistics , author =
Singular vector and singular subspace distribution for the matrix denoising model , volume =. The Annals of Statistics , author =. 2021 , note =. doi:10.1214/20-AOS1960 , abstract =
-
[35]
The Annals of Statistics , author =
Asymmetry helps:. The Annals of Statistics , author =. 2021 , note =. doi:10.1214/20-AOS1963 , abstract =
-
[36]
IEEE Transactions on Information Theory , author =
Tackling. IEEE Transactions on Information Theory , author =. 2021 , keywords =. doi:10.1109/TIT.2021.3111828 , abstract =
-
[37]
The Annals of Statistics , author =
Inference for low-rank tensors—no need to debias , volume =. The Annals of Statistics , author =. 2022 , note =. doi:10.1214/21-AOS2146 , abstract =
-
[38]
Foundations and Trends® in Machine Learning , author =
Spectral. Foundations and Trends® in Machine Learning , author =. 2021 , note =. doi:10.1561/2200000079 , abstract =
-
[39]
Advances in Neural Information Processing Systems , author =
Coherence-free. Advances in Neural Information Processing Systems , author =. 2024 , pages =. doi:10.52202/079017-4021 , language =
-
[40]
Journal of the American Statistical Association116(536), 1983–1993 (2021) https://doi.org/10
Asymptotic. Journal of the American Statistical Association , author =. 2022 , pmid =. doi:10.1080/01621459.2020.1840990 , abstract =
-
[41]
Electronic Journal of Statistics , author =
Normal approximation and confidence region of singular subspaces , volume =. Electronic Journal of Statistics , author =. 2021 , note =. doi:10.1214/21-EJS1876 , abstract =
-
[42]
Analysis of singular subspaces under random perturbations , url =
Wang, Ke , month = mar, year =. Analysis of singular subspaces under random perturbations , url =. doi:10.48550/arXiv.2403.09170 , abstract =
-
[43]
The Annals of Statistics , author =
Universal rank inference via residual subsampling with application to large networks , volume =. The Annals of Statistics , author =. 2023 , note =. doi:10.1214/23-AOS2282 , abstract =
-
[44]
IEEE Transactions on Information Theory , author =
Minimax. IEEE Transactions on Information Theory , author =. 2025 , keywords =. doi:10.1109/TIT.2024.3514795 , abstract =
-
[45]
Asymptotic limits of spiked eigenvalues and eigenvectors of signal-plus-noise matrices with weak signals and heteroskedastic noise , volume =. Bernoulli , author =. 2025 , note =. doi:10.3150/24-BEJ1808 , abstract =
-
[46]
Extreme value theory for singular subspace estimation in the matrix denoising model , url =
Chang, Junhyung and Cape, Joshua , month = jul, year =. Extreme value theory for singular subspace estimation in the matrix denoising model , url =. doi:10.48550/arXiv.2507.19978 , abstract =
-
[47]
Fan, Jianqing and Fan, Yingying and Lv, Jinchi and Yang, Fan and Yu, Diwen , month = mar, year =. Asymptotic. doi:10.48550/arXiv.2503.00640 , abstract =
-
[48]
Statistical Inference for Linear Functions of Eigenvectors with Small Eigengaps
Agterberg, Joshua , month = oct, year =. Distributional. doi:10.48550/arXiv.2308.02480 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.02480
-
[49]
Journal of Computational and Graphical Statistics , author =
A. Journal of Computational and Graphical Statistics , author =. 2017 , note =. doi:10.1080/10618600.2016.1193505 , abstract =
-
[50]
Fu, Kang and Hu, Jianwei and Keita, Seydou and Liu, Hao , month = dec, year =. Two-. doi:10.48550/arXiv.2211.08668 , abstract =
-
[51]
Testing for. Econometrica , author =. 2022 , note =. doi:10.3982/ECTA18093 , abstract =
-
[52]
Bhadra, Somnath and Chakraborty, Kaustav and Sengupta, Srijan and Lahiri, Soumendra , month = feb, year =. A. doi:10.48550/arXiv.1911.06869 , abstract =
-
[53]
Jin, Xin and Chan, Kit and Barnett, Ian and Ghosh, Riddhi Pratim , month = feb, year =. Two-. doi:10.48550/arXiv.2402.11133 , abstract =
-
[54]
Nguen, Chung Kyong and Padilla, Oscar Hernan Madrid and Amini, Arash A. , month = jun, year =. Network two-sample test for block models , url =. doi:10.48550/arXiv.2406.06014 , abstract =
-
[55]
Two-sample test of sparse stochastic block models , url =
Wu, Qianyong and Hu, Jiang , month = apr, year =. Two-sample test of sparse stochastic block models , url =. doi:10.48550/arXiv.2304.00739 , abstract =
-
[56]
Shen, Shuting and Lu, Junwei , month = oct, year =. Combinatorial-. doi:10.48550/arXiv.2010.15063 , abstract =
-
[57]
Wang, Yiran and Tang, Minh and Lahiri, Soumendra Nath , month = aug, year =. Two-sample. doi:10.48550/arXiv.2008.01038 , abstract =
-
[58]
Journal of Machine Learning Research , author =
Change. Journal of Machine Learning Research , author =. 2020 , pages =
2020
-
[59]
The Annals of Statistics , author =
Optimal. The Annals of Statistics , author =. 2021 , note =
2021
-
[60]
Efficient
Gangrade, Aditya and Venkatesh, Praveen and Nazer, Bobak and Saligrama, Venkatesh , year =. Efficient. Advances in
-
[61]
Journal of Machine Learning Research , author =
Inference for. Journal of Machine Learning Research , author =. 2021 , pages =
2021
-
[62]
Limit results for distributed estimation of invariant subspaces in multiple networks inference and
Zheng, Runbing and Tang, Minh , month = may, year =. Limit results for distributed estimation of invariant subspaces in multiple networks inference and. doi:10.48550/arXiv.2206.04306 , abstract =
-
[63]
Journal of Computational and Graphical Statistics , author =
A. Journal of Computational and Graphical Statistics , author =. doi:10.1080/10618600.2025.2509588 , abstract =
-
[64]
Hypothesis testing for equality of latent positions in random graphs , volume =. Bernoulli , author =. 2023 , note =. doi:10.3150/22-BEJ1581 , abstract =
-
[65]
Agterberg, Joshua and Zhang, Anru , month = oct, year =. Statistical. doi:10.48550/arXiv.2410.06381 , abstract =
-
[66]
IEEE Transactions on Information Theory , author =
Entrywise. IEEE Transactions on Information Theory , author =. 2022 , keywords =. doi:10.1109/TIT.2022.3159085 , abstract =
-
[67]
The Annals of Statistics , author =
Inference for heteroskedastic. The Annals of Statistics , author =. 2024 , note =. doi:10.1214/24-AOS2366 , abstract =
-
[68]
The Annals of Statistics , author =
Deflated. The Annals of Statistics , author =. 2025 , note =. doi:10.1214/24-AOS2456 , abstract =
-
[69]
Journal of Machine Learning Research , author =
Robust. Journal of Machine Learning Research , author =. 2024 , pages =
2024
-
[70]
IEEE Transactions on Information Theory , author =
Bias-. IEEE Transactions on Information Theory , author =. 2024 , keywords =. doi:10.1109/TIT.2024.3471953 , abstract =
-
[71]
Entrywise limit theorems for eigenvectors of signal-plus-noise matrix models with weak signals , volume =. Bernoulli , author =. 2024 , note =. doi:10.3150/23-BEJ1602 , abstract =
-
[72]
The Annals of Statistics , author =
Higher-order entrywise eigenvectors analysis of low-rank random matrices:. The Annals of Statistics , author =. 2025 , note =. doi:10.1214/25-AOS2520 , abstract =
-
[73]
An eigenvector-assisted estimation framework for signal-plus-noise matrix models , volume =. Biometrika , author =. 2024 , pages =. doi:10.1093/biomet/asad058 , abstract =
-
[74]
Tang, Minh and Cape, Joshua R. , month = jan, year =. Eigenvector fluctuations and limit results for random graphs with infinite rank kernels , url =. doi:10.48550/arXiv.2501.15725 , abstract =
-
[75]
The Annals of Statistics , author =
Edgeworth expansions for network moments , volume =. The Annals of Statistics , author =. 2022 , note =. doi:10.1214/21-AOS2125 , abstract =
-
[76]
Subsampling sparse graphons under minimal assumptions , volume =. Biometrika , author =. 2023 , pages =. doi:10.1093/biomet/asac032 , abstract =
-
[77]
Local bootstrap for network data , volume =. Biometrika , author =. 2025 , pages =. doi:10.1093/biomet/asae046 , abstract =
-
[78]
Journal of Machine Learning Research , author =
Quantifying. Journal of Machine Learning Research , author =. 2023 , pages =
2023
-
[79]
Electronic Journal of Statistics , author =
Subsampling-based modified. Electronic Journal of Statistics , author =. 2024 , note =. doi:10.1214/24-EJS2309 , abstract =
-
[80]
Shao, Meijia and Xia, Dong and Zhang, Yuan , month = jun, year =. U-. doi:10.48550/arXiv.2306.03793 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.