Query-Limited Community Recovery in Stochastic Block Models
Pith reviewed 2026-06-28 13:02 UTC · model grok-4.3
The pith
Adaptive querying achieves exact community recovery in stochastic block models with n+o(n) queries where uniform querying requires more.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a two-stage adaptive strategy for querying a noisy neighborhood oracle succeeds with n+o(n) queries for exact recovery in regimes where balanced uniform querying requires mn queries for m>1, and that with an additional subsampled graph, adaptive querying with sublinear budget achieves exact recovery by targeting uncertain vertices while uniform querying does not improve over the subsampled graph.
What carries the argument
The noisy neighborhood oracle, which reveals each true neighbor independently with fixed probability but never returns non-neighbors, and the two-stage adaptive querying strategy that first uses initial queries to identify uncertain vertices and then resolves them.
If this is right
- Exact recovery is possible with asymptotically linear but fewer queries than the uniform benchmark.
- With a subsampled graph, sublinear adaptive queries suffice for exact recovery.
- Non-adaptive balanced uniform querying reduces to an attenuated SBM subject to the Abbe-Bandeira-Hall threshold.
- Adaptivity improves upon the information-theoretic limits set by non-adaptive methods.
Where Pith is reading between the lines
- Similar adaptivity gaps may exist in other network inference problems with query limits.
- Extensions to multi-community models or different oracle noise models could be tested.
- The results suggest designing sequential algorithms for data acquisition in large networks.
Load-bearing premise
The SBM parameters lie in the regime where the Abbe-Bandeira-Hall threshold is the information-theoretic limit and the noisy oracle never returns false positives.
What would settle it
Observing that the adaptive strategy fails to achieve exact recovery when the oracle occasionally returns false positives or when the SBM parameters are outside the Abbe-Bandeira-Hall relevant regime.
read the original abstract
We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish information-theoretic results on exact community recovery in the two-community stochastic block model with access limited to a noisy neighborhood oracle (true neighbors revealed independently with probability p, no false positives) and optionally a subsampled graph. For oracle-only access, it shows that balanced uniform querying reduces to an attenuated SBM to which the Abbe-Bandeira-Hall exact recovery threshold applies, serving as a non-adaptive benchmark. It then presents a two-stage adaptive strategy that achieves exact recovery with n+o(n) queries in a regime where uniform querying requires mn queries for m>1. With the subsampled graph, it proves that adaptive querying with sublinear budget can achieve exact recovery by targeting uncertain vertices, while non-adaptive uniform querying with sublinear budget does not improve over the subsampled graph.
Significance. If the derivations hold, the results are significant as they demonstrate strict adaptivity gaps in the query-limited setting for SBM exact recovery. The explicit construction of a two-stage strategy outperforming uniform querying, and the sublinear query gap when combined with subsampled observations, provide concrete evidence that adaptive data acquisition can improve upon information-theoretic limits of non-adaptive methods. The reduction to known ABH thresholds is a positive aspect, grounding the new results in established theory.
major comments (4)
- [Model and Abstract] The analysis assumes the oracle returns only true neighbors with independent probability p and never false positives. This one-sided noise is load-bearing for the reduction to the standard attenuated SBM and the application of the ABH threshold; the manuscript should explicitly state in the model section whether small false-positive rates would preserve the adaptivity gaps or require new analysis.
- [§3 (Uniform Querying Benchmark)] The claim that balanced uniform querying reduces to an SBM with attenuated edge probabilities to which the ABH threshold applies is central. The manuscript needs to derive the exact attenuated parameters (e.g., the effective edge probability) and confirm that the exact-recovery condition is precisely the ABH criterion without post-hoc adjustments.
- [§4 (Adaptive Two-Stage Strategy)] The two-stage adaptive strategy achieving exact recovery with n+o(n) queries depends on the first stage producing a graph whose community structure allows reliable identification of uncertain vertices for the second stage. A detailed bound on the probability that the first-stage graph falls outside the ABH regime or misidentifies vertices is required to ensure the total query count remains n+o(n).
- [§5 (Subsampled Graph Model)] The sublinear-query adaptivity gap relies on adaptive querying targeting a small set of uncertain vertices after observing the subsampled graph. The analysis should specify how the subsampled graph is generated (e.g., sampling probability) and verify that the combined observations allow crossing the ABH threshold only with adaptive targeting.
minor comments (3)
- [Notation] Ensure consistent use of parameters like p for the oracle probability and the SBM edge probabilities a/n, b/n throughout the paper.
- [References] Include a clear citation to the Abbe-Bandeira-Hall paper when first mentioning the threshold.
- [Proofs] The proofs of the reductions and thresholds could benefit from more explicit steps linking to the ABH result to aid readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Model and Abstract] The analysis assumes the oracle returns only true neighbors with independent probability p and never returns non-neighbors. This one-sided noise is load-bearing for the reduction to the standard attenuated SBM and the application of the ABH threshold; the manuscript should explicitly state in the model section whether small false-positive rates would preserve the adaptivity gaps or require new analysis.
Authors: The model section explicitly defines the oracle as returning true neighbors independently with probability p and no false positives. This one-sided noise enables the direct reduction to an attenuated SBM. We agree that small false-positive rates would introduce two-sided noise, changing the model and likely requiring new analysis for the thresholds. We will add a discussion in the model section clarifying that the results are for one-sided noise and that two-sided noise would necessitate separate analysis, though the adaptivity gaps may hold in a qualitative sense. revision: partial
-
Referee: [§3 (Uniform Querying Benchmark)] The claim that balanced uniform querying reduces to an SBM with attenuated edge probabilities to which the ABH threshold applies is central. The manuscript needs to derive the exact attenuated parameters (e.g., the effective edge probability) and confirm that the exact-recovery condition is precisely the ABH criterion without post-hoc adjustments.
Authors: Section 3 derives the reduction by showing that the effective connection probability becomes 1 - (1-p)^m for m queries per vertex, leading to attenuated parameters a' = a [1 - (1-p)^m] and b' = b [1 - (1-p)^m]. The exact recovery condition is then exactly the ABH threshold applied to (a',b'). We will expand this derivation with the explicit formulas and confirm no post-hoc adjustments are needed. revision: yes
-
Referee: [§4 (Adaptive Two-Stage Strategy)] The two-stage adaptive strategy achieving exact recovery with n+o(n) queries depends on the first stage producing a graph whose community structure allows reliable identification of uncertain vertices for the second stage. A detailed bound on the probability that the first-stage graph falls outside the ABH regime or misidentifies vertices is required to ensure the total query count remains n+o(n).
Authors: The first stage uses a number of queries that places it above the ABH threshold with high probability, allowing identification of a small set of uncertain vertices. We will add a detailed probabilistic bound using concentration inequalities to show that the probability of the first stage failing to produce a reliable coarse recovery is o(1/n), ensuring the total queries remain n + o(n) with high probability. revision: yes
-
Referee: [§5 (Subsampled Graph Model)] The sublinear-query adaptivity gap relies on adaptive querying targeting a small set of uncertain vertices after observing the subsampled graph. The analysis should specify how the subsampled graph is generated (e.g., sampling probability) and verify that the combined observations allow crossing the ABH threshold only with adaptive targeting.
Authors: The subsampled graph is generated by retaining each edge independently with probability q, as stated in the model. We will explicitly restate the generation process and provide the calculation showing that the effective parameters from the subsampled graph plus adaptive queries on o(n) vertices cross the ABH threshold, whereas uniform querying on sublinear budget does not improve the parameters sufficiently. revision: yes
Circularity Check
No circularity; central claims apply external ABH threshold to explicitly constructed attenuated SBM
full rationale
The paper explicitly constructs an attenuated SBM from uniform querying (via independent per-edge revelation with probability p and no false positives) and invokes the known Abbe-Bandeira-Hall exact-recovery threshold as an external benchmark. The adaptive two-stage strategy and sublinear-query gap are then shown relative to this benchmark. No parameters are fitted inside the paper and then renamed as predictions; no self-citations are load-bearing for the thresholds; the oracle model is stated as an assumption rather than derived; and the derivation does not reduce any claimed result to itself by definition or renaming. The analysis is therefore self-contained against an independent external result.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The underlying graph is generated from the two-community stochastic block model with fixed intra- and inter-community probabilities.
- domain assumption The noisy neighborhood oracle returns each true neighbor independently with fixed probability p and never returns non-neighbors.
Reference graph
Works this paper leans on
-
[1]
Community detection and stochastic block models: recent developments.Journal of Machine Learning Research, 18(177):1–86, 2018
Emmanuel Abbe. Community detection and stochastic block models: recent developments.Journal of Machine Learning Research, 18(177):1–86, 2018
2018
-
[2]
Recovering communities in the general stochastic block model without knowing the parameters.Advances in neural information processing systems, 28, 2015
Emmanuel Abbe and Colin Sandon. Recovering communities in the general stochastic block model without knowing the parameters.Advances in neural information processing systems, 28, 2015
2015
-
[3]
Bandeira, and Georgina Hall
Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016
2016
-
[4]
Entrywise eigenvector analysis of random matrices with low expected rank.Annals of Statistics, 48(3):1452, 2020
Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank.Annals of Statistics, 48(3):1452, 2020
2020
-
[5]
Information-theoretic thresholds for community detection in sparse networks
Jess Banks, Cristopher Moore, Joe Neeman, and Praneeth Netrapalli. Information-theoretic thresholds for community detection in sparse networks. InConference on Learning Theory, pages 383–416. PMLR, 2016
2016
-
[6]
Active learning with clustering
Zal´ an Bod´ o, Zsolt Minier, and Lehel Csat´ o. Active learning with clustering. InActive Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 127–139. JMLR Workshop and Conference Proceedings, 2011
2010
-
[7]
Clustering attributed graphs: models, measures and methods.Network Science, 3(3):408–444, 2015
C´ ecile Bothorel, Juan David Cruz, Matteo Magnani, and Barbora Micenkova. Clustering attributed graphs: models, measures and methods.Network Science, 3(3):408–444, 2015
2015
-
[8]
Strong large deviation and local limit theorems
Narasinga Rao Chaganty and Jayaram Sethuraman. Strong large deviation and local limit theorems. The Annals of Probability, pages 1671–1690, 1993
1993
-
[9]
Efficient graph matching for correlated stochastic block models
Shuwen Chai and Mikl´ os Z R´ acz. Efficient graph matching for correlated stochastic block models. Advances in Neural Information Processing Systems, 37:116388–116461, 2024
2024
-
[10]
A computational transition for detecting correlated stochastic block models by low-degree polynomials.The Annals of Statistics, 54(1):226–251, 2026
Guanyi Chen, Jian Ding, Shuyang Gong, and Zhangsong Li. A computational transition for detecting correlated stochastic block models by low-degree polynomials.The Annals of Statistics, 54(1):226–251, 2026
2026
-
[11]
Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery
Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. InConference on Learning Theory, pages 391–423. PMLR, 2015
2015
-
[12]
Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´ a. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(6):066106, 2011
2011
-
[13]
Strong consistency, graph Laplacians, and the stochastic block model.Journal of Machine Learning Research, 22(117):1–44, 2021
Shaofeng Deng, Shuyang Ling, and Thomas Strohmer. Strong consistency, graph Laplacians, and the stochastic block model.Journal of Machine Learning Research, 22(117):1–44, 2021
2021
-
[14]
Low-degree evidence for computational transition of recovery rate in stochastic block model
Jingqiu Ding, Yiding Hua, Lucas Slot, and David Steurer. Low-degree evidence for computational transition of recovery rate in stochastic block model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 11
2025
-
[15]
Active learning for community detection in stochastic block models
Akshay Gadde, Eyal En Gad, Salman Avestimehr, and Antonio Ortega. Active learning for community detection in stochastic block models. In2016 IEEE International Symposium on Information Theory (ISIT), pages 1889–1893. IEEE, 2016
2016
-
[16]
Community detection in the hypergraph SBM: Exact recovery given the similarity matrix
Julia Gaudio and Nirmit Joshi. Community detection in the hypergraph SBM: Exact recovery given the similarity matrix. InThe Thirty Sixth Annual Conference on Learning Theory, pages 469–510. PMLR, 2023
2023
-
[17]
Racz, and Anirudh Sridhar
Julia Gaudio, Miklos Z. Racz, and Anirudh Sridhar. Exact community recovery in correlated stochastic block models. InProceedings of Thirty Fifth Conference on Learning Theory, volume 178 ofProceedings of Machine Learning Research, pages 2183–2241, 2022
2022
-
[18]
Modeling social networks from sampled data.The Annals of Applied Statistics, 4(1):5, 2010
Mark S Handcock and Krista J Gile. Modeling social networks from sampled data.The Annals of Applied Statistics, 4(1):5, 2010
2010
-
[19]
Stochastic blockmodels: First steps.Social networks, 5(2):109–137, 1983
Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social networks, 5(2):109–137, 1983
1983
-
[20]
A survey of community detection approaches: From statistical modeling to deep learning.IEEE Trans- actions on Knowledge and Data Engineering, 35(2):1149–1170, 2021
Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Dongxiao He, Jia Wu, Philip S Yu, and Weixiong Zhang. A survey of community detection approaches: From statistical modeling to deep learning.IEEE Trans- actions on Knowledge and Data Engineering, 35(2):1149–1170, 2021
2021
-
[21]
Stochastic blockmodels and community structure in networks
Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 83(1):016107, 2011
2011
-
[22]
Information-theoretic limits for community detection in network models
Chuyang Ke and Jean Honorio. Information-theoretic limits for community detection in network models. Advances in Neural Information Processing Systems, 31, 2018
2018
-
[23]
Active community detection with maximal expected model change
Dan Kushnir and Benjamin Mirabelli. Active community detection with maximal expected model change. InInternational Conference on Artificial Intelligence and Statistics, pages 724–734. PMLR, 2020
2020
-
[24]
Understanding the limita- tions of network online learning.Applied Network Science, 5(1):60, 2020
Timothy LaRock, Timothy Sakharov, Sahely Bhadra, and Tina Eliassi-Rad. Understanding the limita- tions of network online learning.Applied Network Science, 5(1):60, 2020
2020
-
[25]
Consistency thresholds for the planted bisection model
Elchanan Mossel, Joe Neeman, and Allan Sly. Consistency thresholds for the planted bisection model. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing, pages 69–75, 2015
2015
-
[26]
Network two-sample test for block models.arXiv preprint arXiv:2406.06014, 2024
Chung Kyong Nguen, Oscar Hernan Madrid Padilla, and Arash A Amini. Network two-sample test for block models.arXiv preprint arXiv:2406.06014, 2024
-
[27]
Hierarchical block structures and high-resolution model selection in large networks
Tiago P Peixoto. Hierarchical block structures and high-resolution model selection in large networks. Physical Review X, 4(1):011047, 2014
2014
-
[28]
Correlated stochastic block models: Exact graph matching with ap- plications to recovering communities
Miklos Racz and Anirudh Sridhar. Correlated stochastic block models: Exact graph matching with ap- plications to recovering communities. InAdvances in Neural Information Processing Systems, volume 34, 2021
2021
-
[29]
R´ acz and Jifan Zhang
Mikl´ os Z. R´ acz and Jifan Zhang. Harnessing multiple correlated networks for exact community recovery. InAdvances in Neural Information Processing Systems, volume 37. Curran Associates, Inc., 2024
2024
-
[30]
Network sampling coverage ii: The effect of non-random missing data on network measurement.Social networks, 48:78–99, 2017
Jeffrey A Smith, James Moody, and Jonathan H Morgan. Network sampling coverage ii: The effect of non-random missing data on network measurement.Social networks, 48:78–99, 2017
2017
-
[31]
Robust offline active learning on graphs.Advances in Neural Information Processing Systems, 37:58955–58983, 2024
Yuanchen Wu and Yubai Yuan. Robust offline active learning on graphs.Advances in Neural Information Processing Systems, 37:58955–58983, 2024. 12
2024
-
[32]
Community detection in bipartite networks with stochastic block models.Physical Review E, 102(3):032309, 2020
Tzu-Chi Yen and Daniel B Larremore. Community detection in bipartite networks with stochastic block models.Physical Review E, 102(3):032309, 2020
2020
-
[33]
Community detection via random and adaptive sampling
Se-Young Yun and Alexandre Proutiere. Community detection via random and adaptive sampling. In Conference on Learning Theory, pages 138–175. PMLR, 2014
2014
-
[34]
Fundamental limits of spectral clustering in stochastic block models.IEEE Transactions on Information Theory, 70(10):7320–7348, 2024
Anderson Ye Zhang. Fundamental limits of spectral clustering in stochastic block models.IEEE Transactions on Information Theory, 70(10):7320–7348, 2024. 13 A Omitted proofs from Section 3 For the sharp homogeneous-SBM benchmark, we impose the divisibility conventionk/n∈N 0. Writing m∶=k/n, the uniform policy queries every vertex exactlymtimes. Hence an ex...
2024
-
[35]
Every data-independent uniform querying strategy fails to achieve exact recovery w.h.p
-
[36]
Proof.By Lemma 6, whenk=o(n), uniform non-adaptive querying does not improve the exact-recovery threshold beyond that of the subsampled graphG sub alone
There exists a two-stage adaptive querying strategy that usesG sub to target queries and achieves exact recovery with high probability. Proof.By Lemma 6, whenk=o(n), uniform non-adaptive querying does not improve the exact-recovery threshold beyond that of the subsampled graphG sub alone. Hence, for parameter choices satisfyings( √α−√β)2 <2, any uniform q...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.