Multivariate Inference of Network Moments by Subsampling
Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3
The pith
Node subsampling approximates the joint distribution of multiple network moments under sparse graphon models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Node subsampling provides an asymptotically accurate approximation of the joint distribution of multiple network moments under a general sparse graphon model. The proof rests on a careful characterization of the dependence among the moments and the associated multivariate convergence, which goes beyond existing univariate arguments. This foundation is used to construct the first subsampling-based test valid for two-sample inference on unmatchable networks with unequal edge densities, obtained by combining sparsification with sample splitting.
What carries the argument
Node subsampling applied to the joint distribution of network moments, with sparsification and sample splitting for the two-sample procedure.
If this is right
- Two-sample tests for unmatchable networks become valid even when edge densities differ.
- Joint moment distributions can detect structural differences invisible to separate marginal tests.
- Model selection and goodness-of-fit procedures gain access to multivariate subsampling quantiles.
- Conditional moment distributions become usable for targeted comparisons within the same network.
Where Pith is reading between the lines
- The same subsampling logic might extend to higher-order network summaries beyond the moments treated here.
- If the graphon assumption holds only approximately, the procedure could still serve as a practical diagnostic for network similarity.
- The sample-splitting step suggests a general template for other network inference tasks that require density adjustment.
Load-bearing premise
The networks are generated from a general sparse graphon model that permits characterization of the dependence structure among the moments.
What would settle it
A direct comparison, on networks known to arise from a non-graphon process, between the empirical joint distribution of the moments and the distribution obtained from node subsamples; systematic divergence would falsify the approximation claim.
Figures
read the original abstract
Network moments--rescaled counts of motifs such as stars and triangles--are fundamental summaries of network structure, widely used in goodness-of-fit testing, model selection, and network comparison. While the univariate distribution of a single network moment can be approximated by subsampling, the consistency of subsampling for their {\it joint} distribution has remained unestablished. In this paper, we prove that node subsampling provides an asymptotically accurate approximation of the joint distribution of multiple network moments under a general sparse graphon model. The theoretical analysis requires a careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence, going substantially beyond existing univariate results. Building on this foundation, we address a practically important open problem: two-sample testing between unmatchable networks with unequal edge densities. We propose a novel subsampling-based procedure that combines {\it sparsification} with a {\it sample-splitting} strategy. This yields the first subsampling-based inferential procedure valid for this setting, to our knowledge. We demonstrate the utility of multivariate subsampling inference through simulation studies and a real data application comparing coexpression networks of core and non-core genes in a study of parallel adaptation in Trinidadian guppies, where joint and conditional moment distributions reveal a structural difference that no marginal test can detect.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves that node subsampling yields an asymptotically accurate approximation to the joint distribution of multiple network moments (e.g., rescaled motif counts) under a general sparse graphon model, extending prior univariate results via a characterization of the dependence structure among moments. It further develops a subsampling-based two-sample test for unmatchable networks with unequal edge densities that combines sparsification and sample splitting, and illustrates the procedure on simulations and a real-data comparison of coexpression networks.
Significance. If the central proof holds, the result is significant because it supplies the first rigorous multivariate extension of network-moment subsampling, enabling joint and conditional inference that can detect structural differences invisible to marginal tests. The two-sample procedure directly addresses an open practical problem. The manuscript ships an explicit mathematical proof of joint convergence together with reproducible simulation studies and a real-data application; these are concrete strengths.
major comments (1)
- [theoretical analysis (abstract and main proof)] The central claim rests on a 'careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence' (abstract). Because the provided description does not include the explicit dependence bounds, the form of the limiting covariance, or the rate at which the subsampling error vanishes in the multivariate case, it is not possible to verify that the joint approximation is asymptotically accurate under the stated sparse-graphon conditions.
minor comments (1)
- The phrase 'to our knowledge' for the two-sample procedure should be accompanied by a brief comparison to existing network two-sample methods that do not rely on subsampling.
Simulated Author's Rebuttal
We thank the referee for the constructive comment and positive assessment of the work's significance. We address the concern on the theoretical analysis below.
read point-by-point responses
-
Referee: [theoretical analysis (abstract and main proof)] The central claim rests on a 'careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence' (abstract). Because the provided description does not include the explicit dependence bounds, the form of the limiting covariance, or the rate at which the subsampling error vanishes in the multivariate case, it is not possible to verify that the joint approximation is asymptotically accurate under the stated sparse-graphon conditions.
Authors: We agree that the abstract and introductory description do not explicitly state the dependence bounds, limiting covariance form, or convergence rate, which limits immediate verification. The full proof in Section 3 provides the multivariate extension: the dependence structure among moments is characterized through the graphon-induced covariances for overlapping subgraphs (detailed in the proof of Theorem 3.1), the limiting covariance matrix appears explicitly as the integral expression in Equation (3.4), and the subsampling approximation error vanishes at rate o(1) under the sparse-graphon assumptions (Proposition 3.5). To make these elements verifiable from the high-level description, we will revise the abstract to reference the limiting covariance and add a short paragraph in the introduction summarizing the key bounds and rates. This is a presentation improvement only. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central contribution is a theoretical proof establishing asymptotic validity of node subsampling for the joint distribution of multiple network moments under a general sparse graphon model. This rests on a characterization of dependence among moments and multivariate convergence, explicitly extending prior univariate results. No equations or claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the result is presented as an independent asymptotic analysis rather than a renaming or renormalization of inputs. The two-sample testing procedure is derived from this foundation without circular reduction. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Networks follow a general sparse graphon model permitting multivariate asymptotic convergence of network moments
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we prove that node subsampling provides an asymptotically accurate approximation of the joint distribution of multiple network moments under a general sparse graphon model
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The theoretical analysis requires a careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
J. Agterberg, M. Tang, and C. Priebe. Nonparametric two-sample hypothesis testing for random graphs with negative and repeated eigenvalues. arXiv preprint arXiv:2012.09828, 2020
-
[2]
D. J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11 0 (4): 0 581--598, 1981
work page 1981
-
[3]
A. A. Alyakin, J. Agterberg, H. S. Helm, and C. E. Priebe. Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics. Applied Network Science, 9 0 (1): 0 1, 2024
work page 2024
-
[4]
A. A. Amini, A. Chen, P. J. Bickel, and E. Levina. Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41 0 (4): 0 2097--2122, 2013
work page 2097
- [5]
-
[6]
A.-L. Barab \'a si. Network science. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371 0 (1987): 0 20120375, 2013
work page 1987
-
[7]
B. B. Bhattacharya, A. Chatterjee, and S. Janson. Fluctuations of subgraph counts in graphon based random graphs. Combinatorics, Probability and Computing, pages 1--37, 2022
work page 2022
-
[8]
S ubsampling bootstrap of count features of networks
S. Bhattacharyya and P. Bickel. Supplement to “ S ubsampling bootstrap of count features of networks”. The Annals of Statistics, 2015 a
work page 2015
-
[9]
S. Bhattacharyya and P. J. Bickel. Subsampling bootstrap of count features of networks. The Annals of Statistics, 43 0 (6): 0 2384--2411, 2015 b
work page 2015
-
[10]
P. J. Bickel and A. Chen. A nonparametric view of network models and newman--girvan and other modularities. Proceedings of the National Academy of Sciences, 106 0 (50): 0 21068--21073, 2009
work page 2009
-
[11]
P. J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models. The Annals of Statistics, 39 0 (5): 0 2280--2301, 2011
work page 2011
-
[12]
M. Bloznelis and F. G \"o tze. Orthogonal decomposition of finite population statistics and its applications to distributional asymptotics. The Annals of Statistics, 29 0 (3): 0 899--917, 2001
work page 2001
-
[13]
M. Bloznelis and F. G \"o tze. An E dgeworth expansion for symmetric finite population statistics. The Annals of Probability, 30 0 (3): 0 1238--1265, 2002
work page 2002
-
[14]
B. Bollob \'a s and O. Riordan. Metrics for sparse graphs. arXiv preprint arXiv:0708.1919, 2007
work page internal anchor Pith review Pith/arXiv arXiv 1919
- [15]
-
[16]
T. T. Cai and W. Liu. Large-scale multiple testing of correlations. Journal of the American Statistical Association, 111 0 (513): 0 229--240, 2016
work page 2016
- [17]
-
[18]
S. Chan and E. Airoldi. A consistent histogram estimator for exchangeable graph models. In International Conference on Machine Learning, pages 208--216. PMLR, 2014
work page 2014
-
[19]
S. Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43 0 (1): 0 177--214, 2015
work page 2015
-
[20]
S. Chatterjee, D. Saha, S. Dan, and B. B. Bhattacharya. Two-sample tests for inhomogeneous random graphs in l\_r norm: Optimality and asymptotics. In International Conference on Artificial Intelligence and Statistics, pages 6903--6911. PMLR, 2023
work page 2023
-
[21]
K. Chen and J. Lei. Network cross-validation for determining the number of communities in network data. Journal of the American Statistical Association, 113 0 (521): 0 241--251, 2018
work page 2018
-
[22]
D. Choi and P. J. Wolfe. Co-clustering separately exchangeable network data1. The Annals of Statistics, 42 0 (1): 0 29--63, 2014
work page 2014
-
[23]
S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151--158, 1971
work page 1971
- [24]
-
[25]
E. K. Fischer, Y. Song, K. A. Hughes, W. Zhou, and K. L. Hoke. Nonparallel transcriptional divergence during parallel adaptation. Molecular Ecology, 30 0 (6): 0 1516--1530, 2021
work page 2021
-
[26]
Testing Network Structure Using Relations Between Small Subgraph Probabilities
C. Gao and J. Lafferty. Testing network structure using relations between small subgraph probabilities. arXiv preprint arXiv:1704.06742, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
C. Gao, Y. Lu, and H. H. Zhou. Rate-optimal graphon estimation. The Annals of Statistics, pages 2624--2652, 2015
work page 2015
-
[28]
D. Ghoshdastidar and U. Von Luxburg. Practical methods for graph two-sample testing. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[29]
D. Ghoshdastidar, M. Gutzeit, A. Carpentier, and U. von Luxburg. Two-sample tests for large random graphs using network statistics. In Conference on Learning Theory, pages 954--977. PMLR, 2017
work page 2017
- [30]
-
[31]
A. Green and C. R. Shalizi. Bootstrapping exchangeable random graphs. Electronic Journal of Statistics, 16 0 (1): 0 1058--1095, 2022
work page 2022
-
[32]
D. N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2: 0 275, 1979
work page 1979
- [33]
-
[34]
P. Ji, J. Jin, Z. T. Ke, and W. Li. Co-citation and co-authorship networks of statisticians. Journal of Business & Economic Statistics, 40 0 (2): 0 469--485, 2022
work page 2022
-
[35]
B. Karrer and M. E. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83 0 (1): 0 016107, 2011
work page 2011
-
[36]
J. M. Klusowski and Y. Wu. Estimating the number of connected components in a graph via subgraph sampling. Bernoulli, 26 0 (3): 0 1635--1664, 2020
work page 2020
-
[37]
Bootstrapping networks with latent space structure
K. Levin and E. Levina. Bootstrapping networks with latent space structure. arXiv preprint arXiv:1907.10821, 2019
- [38]
-
[39]
T. Li, E. Levina, and J. Zhu. Network cross-validation by edge sampling. Biometrika, 107 0 (2): 0 257--276, 2020
work page 2020
-
[40]
T. Li, L. Lei, S. Bhattacharyya, K. Van den Berge, P. Sarkar, P. J. Bickel, and E. Levina. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117 0 (538): 0 951--968, 2022
work page 2022
-
[41]
Two-sample Test of Community Memberships of Weighted Stochastic Block Models
Y. Li and H. Li. Two-sample test of community memberships of weighted stochastic block models. arXiv preprint arXiv:1811.12593, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[42]
F. Liu, W. Xu, J. Lu, and D. J. Sutherland. Meta two-sample testing: Learning kernels for testing with limited data. Advances in Neural Information Processing Systems, 34: 0 5848--5860, 2021
work page 2021
-
[43]
L. Lov \'a sz and B. Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B, 96 0 (6): 0 933--957, 2006
work page 2006
-
[44]
R. Lunde and P. Sarkar. Subsampling sparse graphons under minimal assumptions. Biometrika, 110 0 (1): 0 15--32, 2023
work page 2023
- [45]
- [46]
- [47]
-
[48]
R. Miao and T. Li. Informative core identification in complex networks. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 0 (1): 0 108--126, 2023
work page 2023
-
[49]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298 0 (5594): 0 824--827, 2002
work page 2002
-
[50]
M. Newman. Networks. Oxford University Press, 2018
work page 2018
-
[51]
M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98 0 (2): 0 404--409, 2001
work page 2001
-
[52]
S. C. Olhede and P. J. Wolfe. Network histograms and universality of blockmodel approximation. Proceedings of the National Academy of Sciences, 111 0 (41): 0 14722--14727, 2014
work page 2014
-
[53]
P. Ribeiro and F. Silva. G-tries: an efficient data structure for discovering network motifs. In Proceedings of the 2010 ACM symposium on applied computing, pages 1559--1566, 2010
work page 2010
- [54]
- [55]
-
[56]
M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26 0 (2): 0 344--354, 2017 a
work page 2017
-
[57]
M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23 0 (3): 0 1599 -- 1630, 2017 b
work page 2017
-
[58]
S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge university press, 1994
work page 1994
-
[59]
J. Yang, C. Han, and E. Airoldi. Nonparametric estimation and testing of exchangeable graph models. In Artificial Intelligence and Statistics, pages 1060--1067. PMLR, 2014
work page 2014
-
[60]
S. J. Young and E. R. Scheinerman. Random dot product graph models for social networks. In International Workshop on Algorithms and Models for the Web-Graph, pages 138--149. Springer, 2007
work page 2007
-
[61]
M. Yuan and Q. Wen. A practical two-sample test for weighted random graphs. Journal of Applied Statistics, 50 0 (3): 0 495--511, 2023
work page 2023
-
[62]
M. Yuan, R. Liu, Y. Feng, and Z. Shang. Testing community structure for hypergraphs. The Annals of Statistics, 50 0 (1): 0 147--169, 2022
work page 2022
-
[63]
Y. Zhang and D. Xia. Edgeworth expansions for network moments. The Annals of Statistics, 50 0 (2): 0 726--753, 2022
work page 2022
- [64]
-
[65]
L. Zhao and X. Chen. Normal approximation for finite-population u-statistics. Acta mathematicae applicatae Sinica, 6 0 (3): 0 263--272, 1990
work page 1990
-
[66]
Y. Zhao. Graph Theory and Additive Combinatorics: Exploring Structure and Randomness. Cambridge University Press, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.