pith. sign in

arxiv: 2409.01599 · v2 · submitted 2024-09-03 · 📊 stat.ME · math.ST· stat.TH

Multivariate Inference of Network Moments by Subsampling

Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords network momentsnode subsamplinggraphon modeltwo-sample testingmultivariate inferencemotif countssparse networkssample splitting
0
0 comments X

The pith

Node subsampling approximates the joint distribution of multiple network moments under sparse graphon models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that node subsampling yields an asymptotically accurate approximation to the joint distribution of several network moments, such as rescaled counts of stars and triangles. This extends prior univariate results by characterizing the dependence structure among the moments under a general sparse graphon model. The result supports a new subsampling procedure for two-sample testing between networks that cannot be matched and may have unequal edge densities. Such tests matter because network moments serve as basic summaries for model checking and comparison, and marginal tests alone miss joint structure visible only in the multivariate case. The approach is illustrated on simulated data and on coexpression networks from a guppy adaptation study.

Core claim

Node subsampling provides an asymptotically accurate approximation of the joint distribution of multiple network moments under a general sparse graphon model. The proof rests on a careful characterization of the dependence among the moments and the associated multivariate convergence, which goes beyond existing univariate arguments. This foundation is used to construct the first subsampling-based test valid for two-sample inference on unmatchable networks with unequal edge densities, obtained by combining sparsification with sample splitting.

What carries the argument

Node subsampling applied to the joint distribution of network moments, with sparsification and sample splitting for the two-sample procedure.

If this is right

  • Two-sample tests for unmatchable networks become valid even when edge densities differ.
  • Joint moment distributions can detect structural differences invisible to separate marginal tests.
  • Model selection and goodness-of-fit procedures gain access to multivariate subsampling quantiles.
  • Conditional moment distributions become usable for targeted comparisons within the same network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subsampling logic might extend to higher-order network summaries beyond the moments treated here.
  • If the graphon assumption holds only approximately, the procedure could still serve as a practical diagnostic for network similarity.
  • The sample-splitting step suggests a general template for other network inference tasks that require density adjustment.

Load-bearing premise

The networks are generated from a general sparse graphon model that permits characterization of the dependence structure among the moments.

What would settle it

A direct comparison, on networks known to arise from a non-graphon process, between the empirical joint distribution of the moments and the distribution obtained from node subsamples; systematic divergence would falsify the approximation claim.

Figures

Figures reproduced from arXiv: 2409.01599 by Chen-Wei Hua, Mingyu Qi, Tianxi Li, Wen Zhou.

Figure 1
Figure 1. Figure 1: Empirical approximation errors of the cumulative distribution functions under the sparsity level [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical approximation errors of the cumulative distribution functions under the sparsity level [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The blue points in panel (a) and histogram in panel (b) display the subsampling distributions of network moments from the coexpression [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of network moments between collaboration networks in statistics and high-energy physics: the blue points and bars represent [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Joint distributions of network moments for [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical approximation errors of the cumulative distribution functions under [PITH_FULL_IMAGE:figures/full_fig_p067_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical approximation errors of the cumulative distribution functions under [PITH_FULL_IMAGE:figures/full_fig_p067_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Empirical approximation errors of the cumulative distribution functions under [PITH_FULL_IMAGE:figures/full_fig_p067_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of network moments between two genetic networks: the blue points and bars illustrate the subsampling distributions from [PITH_FULL_IMAGE:figures/full_fig_p068_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of network moments between statistics network and high energy physics network: the blue points and bars illustrate the [PITH_FULL_IMAGE:figures/full_fig_p068_10.png] view at source ↗
read the original abstract

Network moments--rescaled counts of motifs such as stars and triangles--are fundamental summaries of network structure, widely used in goodness-of-fit testing, model selection, and network comparison. While the univariate distribution of a single network moment can be approximated by subsampling, the consistency of subsampling for their {\it joint} distribution has remained unestablished. In this paper, we prove that node subsampling provides an asymptotically accurate approximation of the joint distribution of multiple network moments under a general sparse graphon model. The theoretical analysis requires a careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence, going substantially beyond existing univariate results. Building on this foundation, we address a practically important open problem: two-sample testing between unmatchable networks with unequal edge densities. We propose a novel subsampling-based procedure that combines {\it sparsification} with a {\it sample-splitting} strategy. This yields the first subsampling-based inferential procedure valid for this setting, to our knowledge. We demonstrate the utility of multivariate subsampling inference through simulation studies and a real data application comparing coexpression networks of core and non-core genes in a study of parallel adaptation in Trinidadian guppies, where joint and conditional moment distributions reveal a structural difference that no marginal test can detect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proves that node subsampling yields an asymptotically accurate approximation to the joint distribution of multiple network moments (e.g., rescaled motif counts) under a general sparse graphon model, extending prior univariate results via a characterization of the dependence structure among moments. It further develops a subsampling-based two-sample test for unmatchable networks with unequal edge densities that combines sparsification and sample splitting, and illustrates the procedure on simulations and a real-data comparison of coexpression networks.

Significance. If the central proof holds, the result is significant because it supplies the first rigorous multivariate extension of network-moment subsampling, enabling joint and conditional inference that can detect structural differences invisible to marginal tests. The two-sample procedure directly addresses an open practical problem. The manuscript ships an explicit mathematical proof of joint convergence together with reproducible simulation studies and a real-data application; these are concrete strengths.

major comments (1)
  1. [theoretical analysis (abstract and main proof)] The central claim rests on a 'careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence' (abstract). Because the provided description does not include the explicit dependence bounds, the form of the limiting covariance, or the rate at which the subsampling error vanishes in the multivariate case, it is not possible to verify that the joint approximation is asymptotically accurate under the stated sparse-graphon conditions.
minor comments (1)
  1. The phrase 'to our knowledge' for the two-sample procedure should be accompanied by a brief comparison to existing network two-sample methods that do not rely on subsampling.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and positive assessment of the work's significance. We address the concern on the theoretical analysis below.

read point-by-point responses
  1. Referee: [theoretical analysis (abstract and main proof)] The central claim rests on a 'careful characterization of the dependence structure among network moments and the corresponding multivariate asymptotic convergence' (abstract). Because the provided description does not include the explicit dependence bounds, the form of the limiting covariance, or the rate at which the subsampling error vanishes in the multivariate case, it is not possible to verify that the joint approximation is asymptotically accurate under the stated sparse-graphon conditions.

    Authors: We agree that the abstract and introductory description do not explicitly state the dependence bounds, limiting covariance form, or convergence rate, which limits immediate verification. The full proof in Section 3 provides the multivariate extension: the dependence structure among moments is characterized through the graphon-induced covariances for overlapping subgraphs (detailed in the proof of Theorem 3.1), the limiting covariance matrix appears explicitly as the integral expression in Equation (3.4), and the subsampling approximation error vanishes at rate o(1) under the sparse-graphon assumptions (Proposition 3.5). To make these elements verifiable from the high-level description, we will revise the abstract to reference the limiting covariance and add a short paragraph in the introduction summarizing the key bounds and rates. This is a presentation improvement only. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central contribution is a theoretical proof establishing asymptotic validity of node subsampling for the joint distribution of multiple network moments under a general sparse graphon model. This rests on a characterization of dependence among moments and multivariate convergence, explicitly extending prior univariate results. No equations or claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the result is presented as an independent asymptotic analysis rather than a renaming or renormalization of inputs. The two-sample testing procedure is derived from this foundation without circular reduction. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of a general sparse graphon model that allows characterization of moment dependence; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Networks follow a general sparse graphon model permitting multivariate asymptotic convergence of network moments
    Invoked as the setting for the proof of joint distribution approximation (abstract).

pith-pipeline@v0.9.0 · 5760 in / 1279 out tokens · 30959 ms · 2026-05-23T21:02:20.639745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 3 internal anchors

  1. [1]

    Agterberg, M

    J. Agterberg, M. Tang, and C. Priebe. Nonparametric two-sample hypothesis testing for random graphs with negative and repeated eigenvalues. arXiv preprint arXiv:2012.09828, 2020

  2. [2]

    D. J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11 0 (4): 0 581--598, 1981

  3. [3]

    A. A. Alyakin, J. Agterberg, H. S. Helm, and C. E. Priebe. Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics. Applied Network Science, 9 0 (1): 0 1, 2024

  4. [4]

    A. A. Amini, A. Chen, P. J. Bickel, and E. Levina. Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41 0 (4): 0 2097--2122, 2013

  5. [5]

    Amini, F

    O. Amini, F. V. Fomin, and S. Saurabh. Counting subgraphs via homomorphisms. SIAM Journal on Discrete Mathematics, 26 0 (2): 0 695--717, 2012

  6. [6]

    Barab \'a si

    A.-L. Barab \'a si. Network science. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371 0 (1987): 0 20120375, 2013

  7. [7]

    B. B. Bhattacharya, A. Chatterjee, and S. Janson. Fluctuations of subgraph counts in graphon based random graphs. Combinatorics, Probability and Computing, pages 1--37, 2022

  8. [8]

    S ubsampling bootstrap of count features of networks

    S. Bhattacharyya and P. Bickel. Supplement to “ S ubsampling bootstrap of count features of networks”. The Annals of Statistics, 2015 a

  9. [9]

    Bhattacharyya and P

    S. Bhattacharyya and P. J. Bickel. Subsampling bootstrap of count features of networks. The Annals of Statistics, 43 0 (6): 0 2384--2411, 2015 b

  10. [10]

    P. J. Bickel and A. Chen. A nonparametric view of network models and newman--girvan and other modularities. Proceedings of the National Academy of Sciences, 106 0 (50): 0 21068--21073, 2009

  11. [11]

    P. J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models. The Annals of Statistics, 39 0 (5): 0 2280--2301, 2011

  12. [12]

    Bloznelis and F

    M. Bloznelis and F. G \"o tze. Orthogonal decomposition of finite population statistics and its applications to distributional asymptotics. The Annals of Statistics, 29 0 (3): 0 899--917, 2001

  13. [13]

    Bloznelis and F

    M. Bloznelis and F. G \"o tze. An E dgeworth expansion for symmetric finite population statistics. The Annals of Probability, 30 0 (3): 0 1238--1265, 2002

  14. [14]

    Metrics for sparse graphs

    B. Bollob \'a s and O. Riordan. Metrics for sparse graphs. arXiv preprint arXiv:0708.1919, 2007

  15. [15]

    Borgs, J

    C. Borgs, J. Chayes, and L. Lov \'a sz. Moments of two-variable functions and the uniqueness of graph limits. Geometric and functional analysis, 19: 0 1597--1619, 2010

  16. [16]

    T. T. Cai and W. Liu. Large-scale multiple testing of correlations. Journal of the American Statistical Association, 111 0 (513): 0 229--240, 2016

  17. [17]

    Casella and R

    G. Casella and R. Berger. Statistical I nference . CRC Press, 2024

  18. [18]

    Chan and E

    S. Chan and E. Airoldi. A consistent histogram estimator for exchangeable graph models. In International Conference on Machine Learning, pages 208--216. PMLR, 2014

  19. [19]

    Chatterjee

    S. Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43 0 (1): 0 177--214, 2015

  20. [20]

    Chatterjee, D

    S. Chatterjee, D. Saha, S. Dan, and B. B. Bhattacharya. Two-sample tests for inhomogeneous random graphs in l\_r norm: Optimality and asymptotics. In International Conference on Artificial Intelligence and Statistics, pages 6903--6911. PMLR, 2023

  21. [21]

    Chen and J

    K. Chen and J. Lei. Network cross-validation for determining the number of communities in network data. Journal of the American Statistical Association, 113 0 (521): 0 241--251, 2018

  22. [22]

    Choi and P

    D. Choi and P. J. Wolfe. Co-clustering separately exchangeable network data1. The Annals of Statistics, 42 0 (1): 0 29--63, 2014

  23. [23]

    S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151--158, 1971

  24. [24]

    Du and M

    X. Du and M. Tang. Hypothesis testing for equality of latent positions in random graphs. Bernoulli, 29 0 (4): 0 3221--3254, 2023

  25. [25]

    E. K. Fischer, Y. Song, K. A. Hughes, W. Zhou, and K. L. Hoke. Nonparallel transcriptional divergence during parallel adaptation. Molecular Ecology, 30 0 (6): 0 1516--1530, 2021

  26. [26]

    Testing Network Structure Using Relations Between Small Subgraph Probabilities

    C. Gao and J. Lafferty. Testing network structure using relations between small subgraph probabilities. arXiv preprint arXiv:1704.06742, 2017

  27. [27]

    C. Gao, Y. Lu, and H. H. Zhou. Rate-optimal graphon estimation. The Annals of Statistics, pages 2624--2652, 2015

  28. [28]

    Ghoshdastidar and U

    D. Ghoshdastidar and U. Von Luxburg. Practical methods for graph two-sample testing. Advances in Neural Information Processing Systems, 31, 2018

  29. [29]

    Ghoshdastidar, M

    D. Ghoshdastidar, M. Gutzeit, A. Carpentier, and U. von Luxburg. Two-sample tests for large random graphs using network statistics. In Conference on Learning Theory, pages 954--977. PMLR, 2017

  30. [30]

    Gonen, D

    M. Gonen, D. Ron, and Y. Shavitt. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics, 25 0 (3): 0 1365--1411, 2011

  31. [31]

    Green and C

    A. Green and C. R. Shalizi. Bootstrapping exchangeable random graphs. Electronic Journal of Statistics, 16 0 (1): 0 1058--1095, 2022

  32. [32]

    D. N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2: 0 275, 1979

  33. [33]

    Ji and J

    P. Ji and J. Jin. Coauthorship and citation networks for statisticians. The Annals of Applied Statistics, pages 1779--1812, 2016

  34. [34]

    P. Ji, J. Jin, Z. T. Ke, and W. Li. Co-citation and co-authorship networks of statisticians. Journal of Business & Economic Statistics, 40 0 (2): 0 469--485, 2022

  35. [35]

    Karrer and M

    B. Karrer and M. E. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83 0 (1): 0 016107, 2011

  36. [36]

    J. M. Klusowski and Y. Wu. Estimating the number of connected components in a graph via subgraph sampling. Bernoulli, 26 0 (3): 0 1635--1664, 2020

  37. [37]

    Bootstrapping networks with latent space structure

    K. Levin and E. Levina. Bootstrapping networks with latent space structure. arXiv preprint arXiv:1907.10821, 2019

  38. [38]

    Li and C

    T. Li and C. M. Le. Network estimation by mixing: Adaptivity and more. Journal of the American Statistical Association, pages 1--16, 2023

  39. [39]

    T. Li, E. Levina, and J. Zhu. Network cross-validation by edge sampling. Biometrika, 107 0 (2): 0 257--276, 2020

  40. [40]

    T. Li, L. Lei, S. Bhattacharyya, K. Van den Berge, P. Sarkar, P. J. Bickel, and E. Levina. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117 0 (538): 0 951--968, 2022

  41. [41]

    Two-sample Test of Community Memberships of Weighted Stochastic Block Models

    Y. Li and H. Li. Two-sample test of community memberships of weighted stochastic block models. arXiv preprint arXiv:1811.12593, 2018

  42. [42]

    F. Liu, W. Xu, J. Lu, and D. J. Sutherland. Meta two-sample testing: Learning kernels for testing with limited data. Advances in Neural Information Processing Systems, 34: 0 5848--5860, 2021

  43. [43]

    Lov \'a sz and B

    L. Lov \'a sz and B. Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B, 96 0 (6): 0 933--957, 2006

  44. [44]

    Lunde and P

    R. Lunde and P. Sarkar. Subsampling sparse graphons under minimal assumptions. Biometrika, 110 0 (1): 0 15--32, 2023

  45. [45]

    Lunde, E

    R. Lunde, E. Levina, and J. Zhu. Conformal prediction for network-assisted regression. arXiv preprint arXiv:2302.10095, 2023

  46. [46]

    C. Mao, Y. Wu, J. Xu, and S. H. Yu. Testing network correlation efficiently via counting trees. arXiv preprint arXiv:2110.11816, 2021

  47. [47]

    Maugis, S

    P.-A. Maugis, S. Olhede, C. Priebe, and P. Wolfe. Testing for equivalence of network distribution using subgraph counts. Journal of Computational and Graphical Statistics, 29 0 (3): 0 455--465, 2020

  48. [48]

    Miao and T

    R. Miao and T. Li. Informative core identification in complex networks. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 0 (1): 0 108--126, 2023

  49. [49]

    R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298 0 (5594): 0 824--827, 2002

  50. [50]

    M. Newman. Networks. Oxford University Press, 2018

  51. [51]

    M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98 0 (2): 0 404--409, 2001

  52. [52]

    S. C. Olhede and P. J. Wolfe. Network histograms and universality of blockmodel approximation. Proceedings of the National Academy of Sciences, 111 0 (41): 0 14722--14727, 2014

  53. [53]

    Ribeiro and F

    P. Ribeiro and F. Silva. G-tries: an efficient data structure for discovering network motifs. In Proceedings of the 2010 ACM symposium on applied computing, pages 1559--1566, 2010

  54. [54]

    Rodriguez

    L. Rodriguez. Automorphism groups of simple graphs, 2014

  55. [55]

    M. Shao, D. Xia, Y. Zhang, Q. Wu, and S. Chen. Higher-order accurate two-sample network inference and network hashing. arXiv preprint arXiv:2208.07573, 2022

  56. [56]

    M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26 0 (2): 0 344--354, 2017 a

  57. [57]

    M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23 0 (3): 0 1599 -- 1630, 2017 b

  58. [58]

    Wasserman and K

    S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge university press, 1994

  59. [59]

    J. Yang, C. Han, and E. Airoldi. Nonparametric estimation and testing of exchangeable graph models. In Artificial Intelligence and Statistics, pages 1060--1067. PMLR, 2014

  60. [60]

    S. J. Young and E. R. Scheinerman. Random dot product graph models for social networks. In International Workshop on Algorithms and Models for the Web-Graph, pages 138--149. Springer, 2007

  61. [61]

    Yuan and Q

    M. Yuan and Q. Wen. A practical two-sample test for weighted random graphs. Journal of Applied Statistics, 50 0 (3): 0 495--511, 2023

  62. [62]

    M. Yuan, R. Liu, Y. Feng, and Z. Shang. Testing community structure for hypergraphs. The Annals of Statistics, 50 0 (1): 0 147--169, 2022

  63. [63]

    Zhang and D

    Y. Zhang and D. Xia. Edgeworth expansions for network moments. The Annals of Statistics, 50 0 (2): 0 726--753, 2022

  64. [64]

    Zhang, E

    Y. Zhang, E. Levina, and J. Zhu. Estimating network edge probabilities by neighbourhood smoothing. Biometrika, 104 0 (4): 0 771--783, 2017

  65. [65]

    Zhao and X

    L. Zhao and X. Chen. Normal approximation for finite-population u-statistics. Acta mathematicae applicatae Sinica, 6 0 (3): 0 263--272, 1990

  66. [66]

    Y. Zhao. Graph Theory and Additive Combinatorics: Exploring Structure and Randomness. Cambridge University Press, 2023