pith. sign in

arxiv: 2606.21466 · v1 · pith:QJGOK7CWnew · submitted 2026-06-19 · 📊 stat.ME

Likelihood Inference for Latent Network Models under Snowball Sampling

Pith reviewed 2026-06-26 13:29 UTC · model grok-4.3

classification 📊 stat.ME
keywords snowball samplinglatent space modelsnetwork likelihoodstochastic EMcontinuous latent spaceco-inventor networkssampling bias
0
0 comments X

The pith

The exact likelihood for continuous latent space models under snowball sampling reduces to closed form via conditional edge independence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives the exact likelihood of a multi-wave snowball sample for continuous latent space models, where edges form independently given latent vertex quantities. This independence reduces the marginalization over all unobserved network configurations to a closed-form expression that works for the entire class of such models. The authors implement the result through a stochastic Expectation-Maximization algorithm for the Euclidean latent distance model. When applied to multiple snowball samples from a large German semiconductor co-inventor network, the corrected likelihood shows that naive analysis underestimates latent space variance, inflates edge counts, and produces spectral fits nine times worse.

Core claim

The authors establish that conditional edge independence in continuous latent space models reduces the marginal likelihood of a multi-wave snowball sample to a closed-form expression, portable across the CLS class, and demonstrate this via stochastic EM on the Euclidean distance model, where naive inference on patent co-inventor data severely underestimates latent variance and degrades goodness-of-fit.

What carries the argument

The closed-form marginal likelihood for multi-wave snowball samples in CLS models, obtained by factoring the unobserved network configurations under conditional edge independence given latent vertex quantities.

If this is right

  • Parameter estimates for latent space variance and covariate effects become unbiased rather than systematically distorted.
  • The method extends without further derivation to any model inside the continuous latent space class.
  • Spectral goodness-of-fit on real networks improves substantially once the sampling mechanism is incorporated.
  • Multiple independent snowball samples can be drawn from the same large population and analyzed jointly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same marginalization trick may apply to other sampling designs that preserve conditional edge independence.
  • Quantitative interpretations of how covariates shape network structure become reliable only after this correction.
  • The framework suggests checking whether real networks approximately satisfy the conditional independence assumption before applying the closed form.

Load-bearing premise

The observed data come from multi-wave snowball sampling and the network edges form independently conditional on latent vertex-level quantities.

What would settle it

A simulation that generates networks from a CLS model, draws snowball samples, computes both the closed-form likelihood and a brute-force numerical integral over unobserved edges, then checks whether the two agree exactly.

Figures

Figures reproduced from arXiv: 2606.21466 by G\"oran Kauermann, Nurzhan Sapargali, Sergio Buttazzo.

Figure 1
Figure 1. Figure 1: Graphical illustration of a 2-wave snowball sample. [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Boxplots of the naive and snowball-corrected estimators of [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simulation results for the distance model under 2-wave snowball sampling. Each panel corresponds to one of the [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pairs of estimates (ˆα (b) , ψˆ(b) ) obtained by applying the snowball-corrected (left) and naive (right) models to each of the 500 snowball samples, with ψˆ on a log scale. Top row: estimates colour-coded by total number of vertices sampled; bottom row: colour-coded by number of vertices sampled per wave. The green square marks the KDE mode used as the aggregate estimate. most common value of the bias acr… view at source ↗
Figure 5
Figure 5. Figure 5: Goodness-of-fit diagnostics for the degree distribution (top row) and edgewise shared partner (ESP) distribution [PITH_FULL_IMAGE:figures/full_fig_p038_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ribbon plots of the ranks of eigenvalues of the simulated Laplacians under the snowball-corrected and naive models [PITH_FULL_IMAGE:figures/full_fig_p039_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Latent position estimates and implied covariate effects under the snowball-corrected and naive models. Mean edge [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Empirical cumulative distribution functions of the wave sizes and edge density under direct 3-wave snowball sampling [PITH_FULL_IMAGE:figures/full_fig_p058_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A 2-wave snowball sample originating from vertex 0 ( [PITH_FULL_IMAGE:figures/full_fig_p064_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Degree distribution of the patent co-inventorship [PITH_FULL_IMAGE:figures/full_fig_p071_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Co-inventorship of patent inventors in Germany over 1997-2012 in the area of semiconductors. [PITH_FULL_IMAGE:figures/full_fig_p072_11.png] view at source ↗
read the original abstract

Snowball sampling is a widely used design for collecting network data from large or hard-to-reach populations, yet naive inference that ignores the sampling mechanism produces systematically biased parameter estimates. We derive the exact likelihood of a multi-wave snowball sample for the class of continuous latent space (CLS) models, in which edges form independently conditional on latent vertex-level quantities, and show that conditional edge independence reduces the marginalization over unobserved network configurations to a closed-form expression portable across the entire CLS class. We develop a stochastic Expectation-Maximization algorithm for the Euclidean latent distance model as a concrete implementation, and apply the framework to the large-scale co-inventor network of German semiconductor patent applicants by drawing multiple snowball samples. We find that the naive procedure severely underestimates latent space variance, produces networks with nearly twice the observed edge count, and achieves a spectral goodness-of-fit nine times worse than the corrected model, which directly affects the quantitative interpretation of covariate effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper derives the exact likelihood for multi-wave snowball samples drawn from continuous latent space (CLS) network models. It shows that conditional edge independence (given vertex latents) reduces the marginalization over unobserved edge configurations consistent with the sampling design to a closed-form product expression that holds across the CLS class. A stochastic EM algorithm is developed for the Euclidean latent distance model as a concrete case, and the method is applied to multiple snowball samples from the German semiconductor co-inventor network, where the naive (sampling-ignoring) estimator is shown to underestimate latent space variance, inflate edge counts, and produce substantially worse spectral goodness-of-fit.

Significance. If the derivation is exact, the result supplies a portable likelihood for an important sampling design in a broad model class, directly correcting a known source of bias in network data. The empirical demonstration quantifies the practical consequences for parameter interpretation and model assessment. Credit is due for the factorization argument that avoids post-hoc adjustments and for the reproducible application to real patent data.

minor comments (2)
  1. [Abstract] The abstract states that the marginalization reduces to a closed-form expression, but the provided text supplies no explicit steps or verification; the full manuscript should include the product-form derivation (referencing the conditional independence assumption) with a short proof sketch or reference to the relevant model equations.
  2. [Application] In the application section, the reported factor-of-nine improvement in spectral goodness-of-fit and the doubling of edge count under the naive procedure would be strengthened by stating the precise definition of the spectral metric and the number of Monte Carlo replications used for the stochastic EM.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the factorization argument, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

Derivation is self-contained from stated model assumptions

full rationale

The paper states that edges form independently conditional on latent vertex quantities and derives the exact likelihood under multi-wave snowball sampling by showing that this independence reduces the sum over unobserved edge configurations to a closed-form product. This follows directly from the joint probability factoring as a product of individual edge probabilities, with the sampling design only constraining which configurations are consistent with observed recruitment waves; no parameters are fitted to a subset and then renamed as a prediction, no self-citation supplies a load-bearing uniqueness result, and no ansatz is smuggled in. The stochastic EM is presented only as a concrete implementation for one member of the CLS class, leaving the general marginalization claim independent of any fitted values or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The contribution rests on the domain assumption of conditional edge independence within the CLS class; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Edges form independently conditional on latent vertex-level quantities
    Defining property of the continuous latent space (CLS) model class stated in the abstract.

pith-pipeline@v0.9.1-grok · 5695 in / 1136 out tokens · 23694 ms · 2026-06-26T13:29:44.788475+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 22 canonical work pages

  1. [1]

    Goodman , journal =

    Leo A. Goodman , journal =. Snowball Sampling , urldate =

  2. [2]

    Journal of Statistical Planning and Inference , volume =

    Frank, Ove , title =. Journal of Statistical Planning and Inference , volume =

  3. [3]

    Perspectives on Social Network Research , publisher =

    Estimation of population totals by use of snowball samples , editor =. Perspectives on Social Network Research , publisher =. 1979 , isbn =. doi:https://doi.org/10.1016/B978-0-12-352550-5.50021-3 , url =

  4. [4]

    Frank, Ove and Snijders, Tom A. B. , title =. Journal of Official Statistics , year =

  5. [5]

    Snijders, Tom A. B. , title =. BMS: Bulletin of Sociological Methodology , year =

  6. [6]

    Journal of Survey Statistics and Methodology , volume =

    Vincent, Kyle and Thompson, Steve , title =. Journal of Survey Statistics and Methodology , volume =

  7. [7]

    Statistics and computing , volume=

    Annealed importance sampling , author=. Statistics and computing , volume=. 2001 , publisher=

  8. [8]

    Zubizarreta

    Faming Liang and Ick Hoon Jin and Qifan Song and Jun S. Liu , title =. Journal of the American Statistical Association , volume =. 2016 , publisher =. doi:10.1080/01621459.2015.1009072 , URL =

  9. [9]

    Journal of the American Statistical Association , volume =

    Jaewoo Park and Murali Haran , title =. Journal of the American Statistical Association , volume =. 2018 , publisher =. doi:10.1080/01621459.2018.1448824 , URL =

  10. [10]

    An introduction to exponential random graph (p*) models for social networks , journal =

    Garry Robins and Pip Pattison and Yuval Kalish and Dean Lusher , keywords =. An introduction to exponential random graph (p*) models for social networks , journal =. 2007 , note =. doi:https://doi.org/10.1016/j.socnet.2006.08.002 , url =

  11. [11]

    and Hooker, Giles and Staicu, Ana-Maria and Scheipl, Fabian and Ruppert, David , year =

    David R. Hunter and Pavel N. Krivitsky and Michael Schweinberger , title =. Journal of Computational and Graphical Statistics , volume =. 2012 , publisher =. doi:10.1080/10618600.2012.732921 , note =

  12. [12]

    Journal of Statistical Software , author =

    ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks , volume =. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v024.i03 , abstract =

  13. [13]

    Journal of the American Statistical Association , volume =

    Peter D Hoff and Adrian E Raftery and Mark S Handcock , title =. Journal of the American Statistical Association , volume =. 2002 , publisher =

  14. [14]

    , year =

    Kaur, Hardeep and Rastelli, Riccardo and Friel, Nial and Raftery, Adrian E. , year =. The Sage Handbook of Social Network Analysis , publisher =

  15. [15]

    2017 , publisher =

    Survey sampling theory and applications , author =. 2017 , publisher =

  16. [16]

    2018 , publisher =

    Networks: An Introduction , author =. 2018 , publisher =

  17. [17]

    Statistical science , volume =

    The Geometry of Continuous Latent Space Models for Network Data , author =. Statistical science , volume =. 2019 , publisher =

  18. [18]

    Network Science , volume =

    Properties of latent variable network models , author =. Network Science , volume =

  19. [19]

    Journal of computational and graphical statistics , volume =

    Fast inference for the latent space network model using a case-control approximate likelihood , author =. Journal of computational and graphical statistics , volume =. 2012 , publisher =

  20. [20]

    Journal of Statistical Software , author =

    Fitting Latent Cluster Models for Networks with latentnet , volume =. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v024.i05 , abstract =

  21. [21]

    2008 , publisher =

    Lemieux, Christiane , title =. 2008 , publisher =

  22. [22]

    , title =

    Erickson, Bonnie H. , title =. Sociological Methodology , year =

  23. [23]

    1995 , institution =

    A stochastic EM algorithm for approximating the maximum likelihood estimate , author =. 1995 , institution =

  24. [24]

    Computational statistics quarterly , volume =

    The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem , author =. Computational statistics quarterly , volume =

  25. [25]

    Mathematical programming , volume =

    On the limited memory BFGS method for large scale optimization , author =. Mathematical programming , volume =. 1989 , publisher =

  26. [26]

    The Stochastic EM Algorithm: Estimation and Asymptotic Results , urldate =

    Søren Feodor Nielsen , journal =. The Stochastic EM Algorithm: Estimation and Asymptotic Results , urldate =

  27. [27]

    Louis , journal =

    Thomas A. Louis , journal =. Finding the Observed Information Matrix when Using the EM Algorithm , urldate =

  28. [28]

    International workshop on algorithms and models for the web-graph , pages =

    Random dot product graph models for social networks , author =. International workshop on algorithms and models for the web-graph , pages =. 2007 , organization =

  29. [29]

    Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , volume =

    Hyperbolic geometry of complex networks , author =. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , volume =. 2010 , publisher =

  30. [30]

    Journal of classification , volume =

    Estimation and prediction for stochastic blockmodels for graphs with latent block structure , author =. Journal of classification , volume =. 1997 , publisher =

  31. [31]

    The Annals of Statistics , volume =

    Co-clustering separately exchangeable network data , author =. The Annals of Statistics , volume =

  32. [32]

    The Annals of Statistics , volume =

    Rate-optimal graphon estimation , author =. The Annals of Statistics , volume =

  33. [33]

    Biometrika , volume =

    Estimating network edge probabilities by neighbourhood smoothing , author =. Biometrika , volume =. 2017 , publisher =

  34. [34]

    Journal of Computational and Graphical Statistics , volume =

    Stochastic block smooth graphon model , author =. Journal of Computational and Graphical Statistics , volume =. 2025 , publisher =

  35. [35]

    2007 , eprint=

    Graph limits and exchangeable random graphs , author=. 2007 , eprint=

  36. [36]

    Journal of Statistical Planning and Inference , volume =

    Parameter identifiability in a class of random graph mixture models , author =. Journal of Statistical Planning and Inference , volume =. 2011 , publisher =

  37. [37]

    Social Networks , volume =

    Estimating network properties from snowball sampled data , author =. Social Networks , volume =. 2012 , publisher =

  38. [38]

    Journal of the Royal Statistical Society Series A: Statistics in Society , volume =

    Fritz, Cornelius and De Nicola, Giacomo and Kevork, Sevag and Harhoff, Dietmar and Kauermann, Göran , title =. Journal of the Royal Statistical Society Series A: Statistics in Society , volume =. 2023 , month =. doi:10.1093/jrsssa/qnad009 , url =

  39. [39]

    M. E. J. Newman , title =. Proceedings of the National Academy of Sciences , volume =. 2001 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.98.2.404 , abstract =

  40. [40]

    Hedges , title =

    Larry V. Hedges , title =. Journal of Educational Statistics , volume =. 1992 , doi =

  41. [41]

    and Davey Smith, George and Schmidt, Amand F

    Hartwig, Fernando P. and Davey Smith, George and Schmidt, Amand F. and Sterne, Jonathan A. C. and Higgins, Julian P. T. and Bowden, Jack , title =. Research Synthesis Methods , volume =. doi:https://doi.org/10.1002/jrsm.1402 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1402 , abstract =

  42. [42]

    Social Networks , volume =

    Conditional estimation of exponential random graph models from snowball sampling designs , author =. Social Networks , volume =. 2013 , doi =

  43. [43]

    Computational Statistics , volume =

    Multivariate plug-in bandwidth selection , author =. Computational Statistics , volume =

  44. [44]

    Journal of Statistical Software , author =

    ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R , volume =. Journal of Statistical Software , author =. 2007 , pages =. doi:10.18637/jss.v021.i07 , abstract =

  45. [45]

    Journal of Nonparametric Statistics , volume =

    Tarn Duong and Martin Hazelton , title =. Journal of Nonparametric Statistics , volume =. 2003 , publisher =. doi:10.1080/10485250306039 , URL =

  46. [46]

    Journal of the American Statistical Association , volume =

    David R Hunter and Steven M Goodreau and Mark S Handcock , title =. Journal of the American Statistical Association , volume =. 2008 , publisher =. doi:10.1198/016214507000000446 , URL =

  47. [47]

    Spectral goodness of fit for network models , journal =

    Jesse Shore and Benjamin Lubin , keywords =. Spectral goodness of fit for network models , journal =. 2015 , issn =. doi:https://doi.org/10.1016/j.socnet.2015.04.004 , url =

  48. [48]

    1997 , publisher =

    Spectral graph theory , author =. 1997 , publisher =

  49. [49]

    Schilling

    Ove Frank and David Strauss , title =. Journal of the American Statistical Association , volume =. 1986 , publisher =. doi:10.1080/01621459.1986.10478342 , URL =

  50. [50]

    2013 , eprint=

    A Survey and Taxonomy of Graph Sampling , author=. 2013 , eprint=

  51. [51]

    and Patone, M

    Zhang, L.-C. and Patone, M. , year =. Graph sampling , volume =. doi:10.1007/s40300-017-0126-y , pages =

  52. [52]

    and Cameron, Christopher J

    Heckathorn, Douglas D. and Cameron, Christopher J. , title =. Annual Review of Sociology , year =. doi:https://doi.org/10.1146/annurev-soc-060116-053556 , url =

  53. [53]

    Johnson and J.S

    J.C. Johnson and J.S. Boster and D. Holbert , abstract =. Estimating relational attributes from snowball samples through simulation , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0378-8733(89)90009-9 , url =

  54. [54]

    2023 , eprint =

    Snowball sampling from graphs , author =. 2023 , eprint =

  55. [55]

    The Annals of applied statistics , volume =

    Modeling social networks from sampled data , author =. The Annals of applied statistics , volume =

  56. [56]

    Survey methodology , volume =

    Model-based estimation with link-tracing sampling designs , author =. Survey methodology , volume =

  57. [57]

    Pattison and Garry L

    Philippa E. Pattison and Garry L. Robins and Tom A.B. Snijders and Peng Wang , keywords =. Conditional estimation of exponential random graph models from snowball sampling designs , journal =. 2013 , note =. doi:https://doi.org/10.1016/j.jmp.2013.05.004 , url =

  58. [58]

    Stivala and Johan H

    Alex D. Stivala and Johan H. Koskinen and David A. Rolls and Peng Wang and Garry L. Robins , keywords =. Snowball sampling for estimating exponential random graph models for large networks , journal =. 2016 , issn =. doi:https://doi.org/10.1016/j.socnet.2015.11.003 , url =

  59. [59]

    1983 , issn =

    Paul W. Holland and Kathryn Blackmond Laskey and Samuel Leinhardt , abstract =. Stochastic blockmodels: First steps , journal =. 1983 , issn =. doi:https://doi.org/10.1016/0378-8733(83)90021-7 , url =

  60. [60]

    Probabilistic Foundations of Statistical Network Analysis , chapter =

    Crane, Harry , year =. Probabilistic Foundations of Statistical Network Analysis , chapter =

  61. [61]

    Ibrahim , title =

    Joseph G. Ibrahim , title =. Journal of the American Statistical Association , volume =. 1990 , publisher =. doi:10.1080/01621459.1990.10474938 , URL =

  62. [62]

    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages =

    Elliptical slice sampling , author =. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages =. 2010 , editor =

  63. [63]

    Nature , volume =

    Collective dynamics of ‘small-world’networks , author =. Nature , volume =. 1998 , publisher =

  64. [64]

    Brain connectivity , volume =

    The ubiquity of small-world networks , author =. Brain connectivity , volume =. 2011 , publisher =