Community Detection on a Randomly Growing Network
Pith reviewed 2026-06-27 21:00 UTC · model grok-4.3
The pith
It is impossible to recover every node's community label in this growing network model, but algorithms can recover labels for central nodes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the model consisting of K preferential attachment trees joined by Erdős–Rényi edges, it is impossible for any algorithm to consistently recover the community label of all nodes from the final graph. Algorithms nevertheless exist that recover the labels of subsets of central nodes, using notions of centrality such as arrival time or degree, by first classifying high-degree nodes and then extending the assignments to the remaining vertices.
What carries the argument
A two-stage classification procedure that first labels high-degree nodes and then propagates labels to other vertices.
If this is right
- Central nodes defined by degree or arrival time admit consistent community recovery even though global recovery is impossible.
- The two-stage procedure succeeds on both synthetic networks drawn from the model and on a real coauthorship network.
- The model produces power-law degree distributions and hubs that standard stochastic block models do not generate.
- Recovery of central nodes requires no knowledge of the underlying growth process or node arrival times.
Where Pith is reading between the lines
- Focusing recovery efforts on hubs may suffice for many practical tasks even when the full partition cannot be obtained.
- The impossibility result may apply to other models that combine preferential attachment with community structure.
- Alternative notions of centrality beyond degree and arrival time could be tested within the same two-stage framework.
Load-bearing premise
The observed graph is generated exactly by the described Markovian process of K preferential attachment trees connected by Erdős–Rényi edges, with no additional information about the growth process available.
What would settle it
A sequence of graphs generated from the model on which some algorithm recovers the correct community for every node with probability tending to one as the number of nodes tends to infinity would falsify the impossibility result.
Figures
read the original abstract
We study community detection on Markovian random networks outside of the Stochastic Block Model (SBM) framework. Specifically, we consider a random network growth process which generates $K$ separate preferential attachment trees and connects them with Erd\H{o}s--R\'enyi edges, so that each tree represents a community and each node inherits the label of the tree to which it belongs. This model is able to produce many features of real world networks that are improbable under SBM, such as power law degree distribution and the existence of chains and hubs. Given only the final graph, without any knowledge of the growth process, we seek to recover the unobserved community membership of the nodes. We first prove that it is impossible for any algorithm to consistently recover the community label of all the nodes. However, we design algorithms which are provably able to recover the community labels of subsets of central nodes, for several different notions of node centrality such as arrival time or degree. Our procedure consists of two stages where, in the first stage, we classify high degree nodes and then, in the second stage, extend the community assignments to the remaining vertices. Numerical experiments and a real data application on a coauthorship network demonstrate the effectiveness of our proposed approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies community detection on a network formed by K preferential attachment trees joined by Erdős–Rényi edges, with community labels inherited from the originating trees. It proves that consistent recovery of all node labels is impossible from the final graph alone, due to late-arriving low-degree nodes having indistinguishable statistics. It then gives two-stage algorithms that provably recover labels for central subsets (by degree or arrival time), first classifying high-degree nodes and then extending assignments, with supporting numerical experiments and a coauthorship network application.
Significance. If the impossibility and recovery results hold, the work meaningfully extends community detection beyond the SBM to a model that naturally produces power-law degrees, hubs, and chains. The precise scoping of the impossibility result (full recovery impossible, central subsets recoverable) and the constructive two-stage procedure that exploits the generative structure are strengths. The paper supplies theoretical guarantees together with empirical validation on both synthetic and real data.
major comments (2)
- [§4] §4 (impossibility theorem): the argument that late nodes are information-theoretically indistinguishable relies on the Markovian growth process; the manuscript should explicitly state whether the proof assumes known or unknown attachment parameters and how the total-variation distance between label-conditional distributions is bounded.
- [Theorem 3] Theorem 3 (recovery of high-degree nodes): the success probability bound appears to depend on the minimum degree threshold au; the manuscript should clarify whether au is chosen adaptively from the observed degree sequence or requires knowledge of the preferential-attachment exponent.
minor comments (3)
- [Abstract] Abstract: the phrase 'Erd h{o}s--Rényi edges' contains a typographical error in the LaTeX rendering of the umlaut.
- [Section 5] Section 5 (experiments): the description of the two-stage procedure would benefit from a pseudocode listing that distinguishes the high-degree classification step from the extension step.
- [Section 6] The real-data application on the coauthorship network reports community sizes but does not include a quantitative comparison against a baseline such as spectral clustering on the same graph.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and will incorporate the requested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [§4] §4 (impossibility theorem): the argument that late nodes are information-theoretically indistinguishable relies on the Markovian growth process; the manuscript should explicitly state whether the proof assumes known or unknown attachment parameters and how the total-variation distance between label-conditional distributions is bounded.
Authors: We thank the referee for highlighting this point. The impossibility argument in Section 4 is derived under the assumption of unknown attachment parameters (consistent with the problem statement that only the final graph is observed). We will add an explicit statement to this effect at the start of Section 4. The total-variation bound is obtained by coupling two label-conditional growth processes and exploiting the fact that, after a sufficiently late arrival time, the conditional degree distributions become arbitrarily close in total variation; we will include a short paragraph outlining this coupling argument in the revision. revision: yes
-
Referee: [Theorem 3] Theorem 3 (recovery of high-degree nodes): the success probability bound appears to depend on the minimum degree threshold τ; the manuscript should clarify whether τ is chosen adaptively from the observed degree sequence or requires knowledge of the preferential-attachment exponent.
Authors: We agree that the dependence of the success probability on τ requires clarification. In the theoretical statement of Theorem 3, τ is expressed in terms of model parameters that include the preferential-attachment exponent. However, the two-stage algorithm itself selects τ in a data-driven manner from the observed degree sequence (e.g., via a high quantile of the empirical degree distribution). We will revise the manuscript to distinguish the theoretical threshold from the adaptive implementation and to note that knowledge of the exponent is not required for the practical choice of τ. revision: yes
Circularity Check
No significant circularity; derivation is self-contained from the generative model
full rationale
The paper defines an explicit generative process (K preferential attachment trees joined by ER edges, with labels inherited from originating trees) and derives both the impossibility result (late nodes inherit statistically indistinguishable features under the Markovian growth) and the positive recovery algorithms (two-stage high-degree classification followed by extension) directly from that process and the final observed graph. No fitted parameters are relabeled as predictions, no self-citations are load-bearing for the central claims, and no ansatz or uniqueness theorem is imported from prior author work. The theorems are scoped exactly to the stated model with no additional information, making the derivation independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The network is generated by K separate preferential attachment trees connected with Erdős–Rényi edges, with each node inheriting the label of its tree.
Reference graph
Works this paper leans on
-
[1]
The Annals of Statistics , volume=
LIKELIHOOD-BASED MODEL SELECTION FOR STOCHASTIC BLOCK MODELS1 , author=. The Annals of Statistics , volume=
-
[2]
Journal of the American Statistical Association , volume=
Optimal estimation of the number of network communities , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[3]
Journal of machine learning research , volume=
Determining the number of communities in degree-corrected stochastic block models , author=. Journal of machine learning research , volume=
-
[4]
Electronic Journal of Statistics , volume=
Estimating the number of communities by spectral methods , author=. Electronic Journal of Statistics , volume=. 2022 , publisher=
2022
-
[5]
Proceedings of the American Mathematical society , volume=
On the shortest spanning subtree of a graph and the traveling salesman problem , author=. Proceedings of the American Mathematical society , volume=. 1956 , publisher=
1956
-
[6]
Contat et al
Eve, Adam and the preferential attachment tree: A. Contat et al. , author=. Probability Theory and Related Fields , volume=. 2024 , publisher=
2024
-
[7]
IEEE Transactions on Control of Network Systems , volume=
Botnet detection based on anomaly and community detection , author=. IEEE Transactions on Control of Network Systems , volume=. 2016 , publisher=
2016
-
[8]
stat , volume=
Deep graph infomax , author=. stat , volume=
-
[9]
Transactions of the International Society for Music Information Retrieval , volume=
Unveiling the hierarchical structure of music by multi-resolution community detection , author=. Transactions of the International Society for Music Information Retrieval , volume=. 2020 , publisher=
2020
-
[10]
Bioinformatics , volume=
Graph-based consensus clustering for class discovery from gene expression data , author=. Bioinformatics , volume=. 2007 , publisher=
2007
-
[11]
Scientometrics , volume=
Italian sociologists: A community of disconnected groups , author=. Scientometrics , volume=. 2020 , publisher=
2020
-
[12]
Scientific reports , volume=
From Louvain to Leiden: guaranteeing well-connected communities , author=. Scientific reports , volume=. 2019 , publisher=
2019
-
[13]
Proceedings of the National Academy of Sciences , volume=
Spectral redemption in clustering sparse networks , author=. Proceedings of the National Academy of Sciences , volume=. 2013 , publisher=
2013
-
[14]
IEEE Transactions on information theory , volume=
Rumors in a network: Who's the culprit? , author=. IEEE Transactions on information theory , volume=. 2011 , publisher=
2011
-
[15]
2019 , publisher=
Probability: theory and examples , author=. 2019 , publisher=
2019
-
[16]
IEEE Transactions on Emerging Topics in Computational Intelligence , volume=
Wavefront-based multiple rumor sources identification by multi-task learning , author=. IEEE Transactions on Emerging Topics in Computational Intelligence , volume=. 2022 , publisher=
2022
-
[17]
Journal of biomedical informatics , volume=
Propagation source identification of infectious diseases with graph convolutional networks , author=. Journal of biomedical informatics , volume=. 2021 , publisher=
2021
-
[18]
The Annals of Statistics , volume=
Network representation using graph root distributions , author=. The Annals of Statistics , volume=. 2021 , publisher=
2021
-
[19]
Journal of Business & Economic Statistics , volume=
Co-citation and co-authorship networks of statisticians , author=. Journal of Business & Economic Statistics , volume=. 2022 , publisher=
2022
-
[20]
2024 , publisher=
Random graphs and complex networks , author=. 2024 , publisher=
2024
-
[21]
Electronic Journal of Probability , volume=
Degree centrality and root finding in growing random networks , author=. Electronic Journal of Probability , volume=. 2023 , publisher=
2023
-
[22]
The Annals of Applied Probability , volume=
Root finding algorithms and persistence of Jordan centrality in growing random trees , author=. The Annals of Applied Probability , volume=. 2022 , publisher=
2022
-
[23]
2010 , publisher=
Random graph dynamics , author=. 2010 , publisher=
2010
-
[24]
science , volume=
Emergence of scaling in random networks , author=. science , volume=. 1999 , publisher=
1999
-
[25]
Journal of the American Statistical Association , volume=
Hierarchical community detection by recursive partitioning , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=
2022
-
[26]
ACM Transactions on Knowledge Discovery from Data , volume=
Distributed Pseudo-Likelihood Method for Community Detection in Large-Scale Networks , author=. ACM Transactions on Knowledge Discovery from Data , volume=. 2024 , publisher=
2024
-
[27]
Nature protocols , volume=
Integration of biological networks and gene expression data using Cytoscape , author=. Nature protocols , volume=. 2007 , publisher=
2007
-
[28]
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
Reconstructing graph diffusion history from a single snapshot , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
-
[29]
Bernoulli , volume=
Inference in balanced community modulated recursive trees , author=. Bernoulli , volume=. 2025 , publisher=
2025
-
[30]
IEEE Transactions on Information Theory , volume=
Community recovery in a preferential attachment graph , author=. IEEE Transactions on Information Theory , volume=. 2019 , publisher=
2019
-
[31]
arXiv preprint arXiv:2303.05909 , year=
A pseudo-likelihood approach to community detection in weighted networks , author=. arXiv preprint arXiv:2303.05909 , year=
-
[32]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Statistical clustering of temporal networks through a dynamic stochastic block model , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2017 , publisher=
2017
-
[33]
Journal of Complex Networks , volume=
Learning latent block structure in weighted networks , author=. Journal of Complex Networks , volume=. 2015 , publisher=
2015
-
[34]
Bayesian Analysis , volume=
Hierarchical stochastic block model for community detection in multiplex networks , author=. Bayesian Analysis , volume=. 2024 , publisher=
2024
-
[35]
Journal of the American Statistical Association , volume=
Community detection in general hypergraph via graph embedding , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[36]
Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining , pages=
node2vec: Scalable feature learning for networks , author=. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining , pages=
-
[37]
Bioinformatics , volume=
Prediction of potential disease-associated microRNAs using structural perturbation method , author=. Bioinformatics , volume=. 2018 , publisher=
2018
-
[38]
Journal of the american statistical association , volume=
Goodness of fit of social network models , author=. Journal of the american statistical association , volume=. 2008 , publisher=
2008
-
[39]
PLoS One , volume=
Social network cohesion in school classes promotes prosocial behavior , author=. PLoS One , volume=. 2018 , publisher=
2018
-
[40]
Physical Review E , volume=
Community detection, link prediction, and layer interdependence in multilayer networks , author=. Physical Review E , volume=. 2017 , publisher=
2017
-
[41]
IEEE Transactions on Knowledge and Data Engineering , volume=
A survey of community detection approaches: From statistical modeling to deep learning , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2021 , publisher=
2021
-
[42]
arXiv preprint arXiv:2511.08867 , year=
Conformal Prediction for Multi-Source Detection on a Network , author=. arXiv preprint arXiv:2511.08867 , year=
-
[43]
arXiv preprint arXiv:2510.05102 , year=
TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration , author=. arXiv preprint arXiv:2510.05102 , year=
-
[44]
Social networks , volume=
Stochastic blockmodels: First steps , author=. Social networks , volume=. 1983 , publisher=
1983
-
[45]
arXiv preprint arXiv:2508.04730 , year=
Statistical inference for core-periphery structures , author=. arXiv preprint arXiv:2508.04730 , year=
-
[46]
Physical Review E , volume=
Identification of core-periphery structure in networks , author=. Physical Review E , volume=. 2015 , publisher=
2015
-
[47]
Electronic Journal of Statistics , volume=
Sparse networks with core-periphery structure , author=. Electronic Journal of Statistics , volume=
-
[48]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Informative core identification in complex networks , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=
2023
-
[49]
Electronic Journal of Probability , volume=
Geometry of weighted recursive and affine preferential attachment trees , author=. Electronic Journal of Probability , volume=. 2021 , publisher=
2021
-
[50]
Random Structures & Algorithms , volume=
Random trees and general branching processes , author=. Random Structures & Algorithms , volume=. 2007 , publisher=
2007
-
[51]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Inference on the history of a randomly growing tree , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2021 , publisher=
2021
-
[52]
Advances in Applied Probability , volume=
Joint degree distributions of preferential attachment random graphs , author=. Advances in Applied Probability , volume=. 2017 , publisher=
2017
-
[53]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Root and community inference on the latent growth process of a network , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=
2024
-
[54]
The Annals of Statistics , pages=
FAST COMMUNITY DETECTION BY SCORE , author=. The Annals of Statistics , pages=. 2015 , publisher=
2015
-
[55]
The Annals of Applied Statistics , pages=
COAUTHORSHIP AND CITATION NETWORKS FOR STATISTICIANS , author=. The Annals of Applied Statistics , pages=. 2016 , publisher=
2016
-
[56]
The Annals of Statistics , volume=
Optimal rates for community estimation in the weighted stochastic block model , author=. The Annals of Statistics , volume=. 2020 , publisher=
2020
-
[57]
Journal of Machine Learning Research , volume=
Community detection and stochastic block models: recent developments , author=. Journal of Machine Learning Research , volume=
-
[58]
International Conference on Machine Learning , pages=
Diffusion source identification on networks with statistical confidence , author=. International Conference on Machine Learning , pages=. 2021 , organization=
2021
-
[59]
Random Structures & Algorithms , volume=
Leaf stripping on uniform attachment trees , author=. Random Structures & Algorithms , volume=. 2025 , publisher=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.