Graph inference with clustering and false discovery rate control
Pith reviewed 2026-05-24 16:42 UTC · model grok-4.3
The pith
In a noisy stochastic block model, reliable node clustering enables FDR control for edge inference together with near-optimal true discovery rate as graph size grows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Provided that the VEM algorithm provides reliable parameter estimates and clustering, the procedure controls the FDR while satisfying an optimal TDR property, up to remainder terms that become small when the size of the graph grows.
What carries the argument
The noisy stochastic block model together with the two-stage procedure that first runs variational EM for clustering and then applies an FDR-controlling threshold to edge p-values derived from the fitted model.
If this is right
- The method controls the false discovery rate at the nominal level while attaining the highest achievable true discovery rate, up to vanishing remainders.
- It outperforms classical FDR-controlling procedures that treat edges as independent and ignore the block structure.
- The FDR and TDR properties remain essentially valid in simulations even when the data-generating process differs from the assumed noisy stochastic block model.
Where Pith is reading between the lines
- The same two-stage logic could be applied to other consistent community-detection algorithms provided they produce accurate cluster labels.
- The approach may improve edge recovery in any domain where nodes naturally fall into groups that govern connection probabilities, such as protein-interaction or social-contact networks.
- Because the remainder terms are shown to vanish only asymptotically, finite-sample behavior on moderately sized graphs would still need separate verification.
Load-bearing premise
The variational expectation-maximization algorithm must deliver reliable estimates of the model parameters and the node clusters.
What would settle it
A sequence of graphs of increasing size on which the variational EM step recovers the clusters accurately yet the realized proportion of false edges among declared edges stays above the nominal FDR level by an amount that does not vanish.
read the original abstract
In this paper, a noisy version of the stochastic block model (NSBM) is introduced and we investigate the three following statistical inferences in this model: estimation of the model parameters, clustering of the nodes and identification of the underlying graph. While the two first inferences are done by using a variational expectation-maximization (VEM) algorithm, the graph inference is done by controlling the false discovery rate (FDR), that is, the average proportion of errors among the edges declared significant, and by maximizing the true discovery rate (TDR), that is, the average proportion of edges declared significant among the true edges. Provided that the VEM algorithm provides reliable parameter estimates and clustering, we theoretically show that our procedure does control the FDR while satisfying an optimal TDR property, up to remainder terms that become small when the size of the graph grows. Numerical experiments show that our method outperforms the classical FDR controlling methods that ignore the underlying SBM topology. In addition, these simulations demonstrate that the FDR/TDR properties of our method are robust to model mis-specification, that is, are essentially maintained outside our model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the noisy stochastic block model (NSBM) and develops procedures for parameter estimation and node clustering via a variational EM (VEM) algorithm, followed by an FDR-controlling method for inferring the underlying graph edges that also targets high TDR. It states a theoretical result establishing FDR control at a nominal level together with an asymptotically optimal TDR property, conditional on VEM consistency, with remainder terms vanishing as graph size grows; numerical experiments are used to illustrate outperformance relative to topology-ignoring FDR procedures and robustness under misspecification.
Significance. Conditional on the VEM consistency hypothesis, the work supplies a concrete mechanism for leveraging block structure to improve edge detection power while retaining FDR guarantees. The explicit separation of the clustering step from the FDR step, together with the reported robustness checks, constitutes a useful contribution to structured multiple-testing problems on networks.
major comments (2)
- [§4] §4 (FDR/TDR theorem): the claimed control is stated to hold up to remainder terms that vanish with n, yet the manuscript supplies neither an explicit order for these remainders nor the precise VEM consistency rates (e.g., in total variation or parameter error) that would make the vanishing statement verifiable; this is load-bearing for the asymptotic claim.
- [Simulation section] Simulation section (Tables 2–4): reported FDR/TDR values are given as point estimates without standard errors or replication counts, so it is impossible to assess whether the observed superiority over BH and the robustness claims are statistically distinguishable from noise.
minor comments (2)
- [Model section] Notation for the NSBM noise parameter is introduced without an immediate comparison table to the classical SBM; adding one line of contrast would prevent reader confusion.
- [Abstract] The abstract asserts optimality of the TDR “up to remainder terms”; a parenthetical remark on the rate (e.g., o(1)) would improve readability without lengthening the abstract.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the detailed comments. We address each major comment below.
read point-by-point responses
-
Referee: [§4] §4 (FDR/TDR theorem): the claimed control is stated to hold up to remainder terms that vanish with n, yet the manuscript supplies neither an explicit order for these remainders nor the precise VEM consistency rates (e.g., in total variation or parameter error) that would make the vanishing statement verifiable; this is load-bearing for the asymptotic claim.
Authors: We agree that the asymptotic claim would be strengthened by greater precision on the remainder terms. The result in Section 4 is conditional on VEM consistency and shows that the remainders vanish as n → ∞ without deriving explicit rates. In the revised manuscript we will add a remark that invokes standard consistency rates available in the VEM-SBM literature (typically o_p(1) in total variation or parameter error under mild separation conditions on the block probabilities), which are already sufficient for the o(1) vanishing of the remainders. This addition will make the statement verifiable without altering the theorem statement itself. revision: yes
-
Referee: [Simulation section] Simulation section (Tables 2–4): reported FDR/TDR values are given as point estimates without standard errors or replication counts, so it is impossible to assess whether the observed superiority over BH and the robustness claims are statistically distinguishable from noise.
Authors: We accept this criticism. The Monte Carlo experiments underlying Tables 2–4 were performed with 100 independent replications, but neither the replication count nor standard errors were reported. In the revision we will state the number of replications explicitly and augment the tables with standard errors for the FDR and TDR entries, allowing readers to gauge the statistical significance of the observed differences. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's FDR/TDR guarantee is explicitly conditional on the external assumption that the VEM algorithm supplies reliable parameter estimates and clustering; this prerequisite is stated outright rather than derived from the same fitted quantities. Under that hypothesis the control and near-optimality (with vanishing remainders) are asserted. No equation reduces a claimed prediction to a fitted input by construction, no load-bearing self-citation chain appears, and the derivation remains independent of the target result once the VEM hypothesis is granted. The result is therefore self-contained against its stated external benchmark.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of blocks
axioms (1)
- domain assumption VEM algorithm yields reliable estimates and clustering when applied to NSBM data
invented entities (1)
-
noisy stochastic block model (NSBM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Allman, E. S., Matias, C., and Rhodes, J. A. (2011). Parameter identifiability in a class of random graph mixture models. Journal of Statistical Planning and Inference , 141(5):1719 –
work page 2011
-
[2]
Banerjee, O., El Ghaoui, L., and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res., 9:485–516. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B , 5...
work page 2008
-
[3]
Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725. Brault, V., Keribin, C., and Mariadassou, M. (2017). Consistency and Asymptotic Normality of Latent Blocks Model Estimators. working paper or pr...
work page internal anchor Pith review Pith/arXiv arXiv 2000
-
[4]
Wu, C. F. J. (1983). On the convergence properties of the em algorithm. Ann. Statist., 11(1):95–
work page 1983
-
[5]
Main lemmas for Section 5 Lemma 9.1
Supplementary material 9.1. Main lemmas for Section 5 Lemma 9.1. Let Assumption 5.1 be true and consider anyθ∈ Θ with the corresponding quantities t1,q,ℓ(θ),t 2,q,ℓ(θ), q,ℓ ∈{ 1,...,Q }2, t1(θ) = minq,ℓt1,q,ℓ(θ) and t2(θ) = maxq,ℓt2,q,ℓ(θ). Then the function t↦→Qθ(θ,t ) is increasing on [t1(θ),t 2(θ)], continuous on (t1(θ), 1], satisfies Qθ(θ,t ) = 0 for t...
work page 2014
-
[6]
Now, since Fθ(θ,t′)> 0, this entails f(t′)≥f(t). Also, if f(t′) = f(t), the inequalities above are all equalities and we have Eθ ∑ (i,j)∈A ℓi,j1{t<ℓ i,j≤t′} =t Eθ ∑ (i,j)∈A 1{t<ℓ i,j≤t′} and thus Eθ ∑ (i,j)∈A (ℓi,j−t)1{t<ℓ i,j≤t′} = 0 which gives (ℓi,j−t)1{t<ℓ i,j≤t′} = 0 Pθ-a.s. for all (i,j ), which is impossible by Assumption 5....
work page 2014
-
[7]
(56) Moreover, for t∈Tθ0(K) with|t−Tθ0(α)|≤ ε, we have |F1,θ0(θ0,t )−F1,θ0(θ0,Tθ0(α))|≤W T,q1(ε)
such that for all ε≤ e, letting η(ε) = 8κ−1(Wθ0,q(ε) + 3Q2ε), we have minK≤ α−η(ε) and α +η(ε)≤ maxK and for any θ,θ′∈ Θ with∥θ−θ0∥∞≤ε and∥θ′−θ0∥∞≤ε, we have sup t∈Tθ0(K) |F0,θ′(θ,t )−F0,θ0(θ0,t )|∨| F1,θ′(θ,t )−F1,θ0(θ0,t )|∨| Fθ′(θ,t )−Fθ0(θ0,t )| ≤ 2Wθ0,q(ε) + 6Q2ε; (54) sup t∈Tθ0(K) ⏐⏐Qθ′(θ,t )−Qθ0(θ0,t ) ⏐⏐≤η(ε); (55) Tθ0(minK)≤Tθ0(α−η(ε))≤Tθ(α)≤Tθ0(...
work page 2014
-
[8]
Similarly, the latter bound is also valid for |F1,θ′(θ,t )−F1,θ0(θ0,t )|
⏐⏐⏐⏐ ≤ sup q,ℓ |q1(t,q,ℓ ;θ′,θ )−q1(t,q,ℓ ;θ0,θ 0)| + ∑ q,ℓ |πq(θ′)πℓ(θ′)wq,ℓ(θ′)−πq(θ0)πℓ(θ0)wq,ℓ(θ0)| ≤W θ0,q(ε) + 3Q2ε. Similarly, the latter bound is also valid for |F1,θ′(θ,t )−F1,θ0(θ0,t )|. Since Fθ′(θ,t ) =F0,θ′(θ,t ) + F1,θ′(θ,t ), this proves (54). The proof of (57) follows similarly. Let us now establish (55). First, by (54), and since Fθ0(θ0, ...
work page 2014
-
[9]
The regularity assumption holds with, for all q,ℓ, imsart-generic ver. 2014/07/30 file: RRV2019_arxiv.tex date: July 25, 2019 /Graph inference with FDR control 37 t1,q,ℓ(θ0) = 0 if and only if σq,ℓ≥σ0 and t2,q,ℓ(θ0) = 1 if and only if σq,ℓ≤σ0. An illustration is given in Figure
work page 2014
-
[10]
For this, we should however avoid the non regular behavior of t↦→Qθ0(θ0,t ) occurring at the boundary point{t1,q,ℓ(θ0)}q,ℓ∪{t2,q,ℓ(θ0)}q,ℓ. This is possible by considering a compact set K containingα and such thatK⊂ (α∗,α∗) where α∗ =α∗(θ0) is given by α∗(θ0) =Qθ0(θ0,t′ 2(θ0))∈ (α∗(θ),π0] (59) t′ 2(θ0) = min{t1,q,ℓ(θ0),t 2,q,ℓ(θ0), for q,ℓ s.t. t1,q,ℓ(θ0)...
work page 2014
-
[11]
2014/07/30 file: RRV2019_arxiv.tex date: July 25, 2019 /Graph inference with FDR control 39 9.3.3
imsart-generic ver. 2014/07/30 file: RRV2019_arxiv.tex date: July 25, 2019 /Graph inference with FDR control 39 9.3.3. Study of the q-value functional By Lemma 9.10, and expression (65), we get that, for all q,ℓ∈{ 1,...,Q }, the function (t,θ,θ′)∈ [0, 1]× Θ2↦→qδ(t,q,ℓ ;θ′,θ ) is continuous on [0, 1]× Θ2. By the explicit expressions of the previous section...
work page 2014
-
[12]
is differentiable in t =Tθ0(α) when Tθ0(α) /∈{t1,t 2}. 9.3.4. Studying α∗ Let θ = (π,w,σ 0,µ,σ )∈ Θ. Recall α∗(θ) = lim t→t1(θ)+ {Qθ(θ,t )} = lim t→t1(θ)+ 1 1 + ∑ q,ℓπqπℓwq,ℓq1(t,q,ℓ;θ,θ)∑ q,ℓπqπℓ(1−wq,ℓ)q0(t,q,ℓ;θ,θ) . We prove in this section the following result. Lemma 9.9. For all θ = (π,w,σ 0,µ,σ )∈ Θ, we have α∗(θ) = 0 if and only if there...
work page 2014
-
[13]
Hence, we have for t close enough to 0, min q,ℓ:σq,ℓ≥σ0 { wq,ℓ 1−wq,ℓ q1(t,q,ℓ ;θ,θ ) q0(t,q,ℓ ;θ,θ ) } ≤ ∑ q,ℓπqπℓwq,ℓq1(t,q,ℓ ;θ,θ )∑ q,ℓπqπℓ(1−wq,ℓ)q0(t,q,ℓ ;θ,θ ) ≤ max q,ℓ:σq,ℓ≥σ0 { wq,ℓ 1−wq,ℓ q1(t,q,ℓ ;θ,θ ) q0(t,q,ℓ ;θ,θ ) } . imsart-generic ver. 2014/07/30 file: RRV2019_arxiv.tex date: July 25, 2019 /Graph inference with FDR control 41 We know by...
work page 2014
-
[14]
qδ(t,q,ℓ ;θ,θ ) = Φδ ( b− (b2− 4ac)1/2 + 2|a| ) + 1− Φδ ( b + (b2− 4ac)1/2 + 2|a| ) = 1− Φ ( −b + 2|a|µ(δ) + (b2− 4ac)1/2 + 2|a|σ(δ) ) + 1− Φ ( b− 2|a|µ(δ) + (b2− 4ac)1/2 + 2|a|σ(δ) ) = 1− Φ ( −|(b− 2|a|µ(δ))| + (b2− 4ac)1/2 + 2|a|σ(δ) ) + 1− Φ ( |(b− 2|a|µ(δ))| + (b2− 4ac)1/2 + 2|a|σ(δ) ) Now use for all z >0, for x→∞ , 1− Φ(−z +x) 1− Φ(z +x) ∼ φ(−z +x) ...
work page 2014
-
[15]
Sinceθ∈ Θ↦→b(q,ℓ,θ ) is continuous, forε smaller than some positive numbere3(q,ℓ,θ 0,α ), we have|b(q,ℓ,θ )−b(q,ℓ,θ 0)|≤| b(q,ℓ,θ 0)|/2. Summing up, we obtain (68) forε≤e(θ0,α,Q ) = min(q,ℓ)/∈C{e1(q,ℓ,θ 0,α )∧e2(q,ℓ,θ 0,α )}∧ min(q,ℓ)∈Ce3(q,ℓ,θ 0,α ). We have for all δ∈{ 0, 1}, θ,θ′∈ Θ with ∥θ−θ0∥∞≤ ε and∥θ′−θ0∥∞≤ ε, for all q,ℓ ∈ {1,...,Q }, for all t∈Tθ...
work page 2014
-
[16]
The continuity follows. imsart-generic ver. 2014/07/30 file: RRV2019_arxiv.tex date: July 25, 2019
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.