pith. sign in

arxiv: 2209.14859 · v3 · submitted 2022-09-23 · 🧮 math.ST · math.PR· stat.ML· stat.TH

Exact Recovery of Community Detection in dependent Gaussian Mixture Models

Pith reviewed 2026-05-24 10:46 UTC · model grok-4.3

classification 🧮 math.ST math.PRstat.MLstat.TH
keywords community detectionGaussian mixture modelsexact recoverydependent noisemaximum likelihood estimatorcovariance matrixwhitened separation
0
0 comments X

The pith

The maximum likelihood estimator recovers exact community labels in Gaussian mixtures with arbitrary covariance when a whitened separation condition holds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies exact recovery of community assignments in a Gaussian mixture model whose noise covariance Sigma can be non-diagonal or singular. It formulates the MLE as a constrained quadratic program using the Moore-Penrose inverse when Sigma is singular, then gives sufficient conditions for exact recovery that are driven by the Sigma-whitened separation between means and by local one-step comparison inequalities near the truth. When Sigma is invertible the paper also supplies converse statements that exact recovery fails for large families of local perturbations whose Gaussian comparison statistics are nondegenerate. For a concrete full-rank non-diagonal block-covariance model it proves a sharp threshold in the unknown-size case together with a no-gap mechanism under which the sufficient and necessary conditions coincide asymptotically.

Core claim

For general covariance structures we obtain sufficient conditions for exact recovery of the MLE when the community sizes are unknown and when they are known; these conditions are driven by the Sigma-whitened separation L_Sigma(x,y) together with local one-step comparison inequalities in the near-truth regime. Under the additional assumption that Sigma is invertible we derive converse results, and for a full-rank non-diagonal block-covariance model we prove a sharp exact-recovery threshold in the unknown-size setting.

What carries the argument

The Sigma-whitened separation L_Sigma(x,y) together with local one-step comparison inequalities that control the MLE in the near-truth regime.

If this is right

  • Exact recovery succeeds whenever the whitened separation exceeds a threshold set by the local comparison inequalities, for both known and unknown community sizes.
  • In the singular case the MLE reduces to a quadratic optimization problem on the support of the induced measure using the Moore-Penrose inverse.
  • When Sigma is invertible, exact recovery fails for any sufficiently nondegenerate family of local label perturbations.
  • In the full-rank non-diagonal block-covariance model the sufficient and necessary conditions for exact recovery coincide asymptotically.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same whitened-distance plus local-comparison strategy could be applied to other mixture models whose dependence structure is captured by a fixed covariance.
  • The no-gap mechanism indicates that information-theoretic and algorithmic thresholds may align for many dependent Gaussian settings beyond the block model.
  • Numerical checks on synthetic draws with known Sigma could verify whether the one-step inequalities are tight in finite samples.

Load-bearing premise

The observations are generated exactly from the stated Gaussian mixture model with fixed covariance matrix Sigma.

What would settle it

A data set drawn from the model in which L_Sigma lies above the paper's sufficient threshold yet the MLE misclassifies at least one point, or lies below the converse threshold yet the MLE recovers perfectly.

Figures

Figures reproduced from arXiv: 2209.14859 by Sichen Yang, Zhongyang Li.

Figure 7.1
Figure 7.1. Figure 7.1: Recovery probability Pr(ˆy ∈ C(y)) for low-level noise s=(1−δ)(2α−1) √ 32 log n (α = 16 25 , δ = 1 2 ), with the number of vertices in each com￾munity unknown. where kZkop is the operator norm of Z; and kZkF is the Frobenius norm of Z. If (7.15) holds, we have λ0(n, H) ≥ (1 + δ) 2 2 log n − C (log n) 2 then (2.25) holds, and the theorem follows from Theorem 2.17. From Theorems 7.4 and 7.5, we can see tha… view at source ↗
Figure 7.2
Figure 7.2. Figure 7.2: Recovery probability Pr(ˆy ∈ C(y)) for high-level noise s=(1+δ)(2α−1) √ 2 log n (α = 16 25 , δ = 1 2 ), with the number of vertices in each com￾munity unknown. 2 4 6 8 10 12 14 16 18 20 n 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Prob [PITH_FULL_IMAGE:figures/full_fig_p030_7_2.png] view at source ↗
Figure 7.3
Figure 7.3. Figure 7.3: Recovery probability Pr(ˇy ∈ C(y)) for low-level noise s=(1−δ)(2α−1) √ 32 log n (α = 16 25 , δ = 1 2 ), with the number of vertices in each com￾munity known [PITH_FULL_IMAGE:figures/full_fig_p030_7_3.png] view at source ↗
Figure 7.4
Figure 7.4. Figure 7.4: Recovery probability Pr(ˇy ∈ C(y)) for high-level noise s=(1+δ)(2α−1) √ 2 log n (α = 16 25 , δ = 1 2 ), with the number of vertices in each com￾munity known. 8. Future Work This work serves as a starting point for considering a more general regime than most current literature on community detection, where they focus on the case that ρi,j;k,l is zero when j 6= l, so Σ is actually a sparse matrix in such a… view at source ↗
read the original abstract

We study exact recovery for community detection in a Gaussian mixture model with dependent and heterogeneous Gaussian noise. The noise covariance matrix $\Sigma$ may be non-diagonal and, in the general formulation, singular. In the singular case, we write the Gaussian likelihood on the support of the induced measure and show that the maximum likelihood estimator (MLE) is a constrained quadratic optimization problem involving the Moore--Penrose inverse. For general covariance structures, we obtain sufficient conditions for exact recovery of the MLE when the community sizes are unknown and when they are known. These conditions are driven by the $\Sigma$-whitened separation $L_\Sigma(x,y)$ together with local one-step comparison inequalities in the near-truth regime. Under the additional assumption that $\Sigma$ is invertible, we derive converse results showing failure of exact recovery when a large family of local perturbations has sufficiently nondegenerate Gaussian comparison statistics. We then analyze a full-rank non-diagonal block-covariance model, prove a sharp exact-recovery threshold in the unknown-size setting, and identify a general no-gap mechanism under which the sufficient and necessary conditions coincide asymptotically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies exact recovery of the MLE for community detection in a Gaussian mixture model whose noise covariance Sigma may be non-diagonal or singular. It formulates the MLE via the Moore-Penrose pseudoinverse when Sigma is singular, derives sufficient conditions for exact recovery (known and unknown community sizes) driven by the whitened separation L_Sigma(x,y) together with local one-step comparison inequalities, obtains converse results when Sigma is invertible, and proves a sharp threshold plus a no-gap mechanism for a full-rank non-diagonal block-covariance model in the unknown-size case.

Significance. If the derivations hold, the results meaningfully extend exact-recovery theory from independent to dependent noise settings and handle singular covariances in a principled way. The no-gap mechanism that aligns sufficient and necessary conditions asymptotically is a notable technical contribution. The work supplies rigorous, model-specific comparison inequalities rather than relying on generic concentration bounds.

minor comments (2)
  1. [§2.2] §2.2: the definition of the whitened distance L_Sigma(x,y) is introduced via the pseudoinverse; a short remark clarifying its relation to the usual Mahalanobis distance when Sigma is singular would aid readability.
  2. [§4] The statement of the block-covariance model in §4 could include an explicit display of the matrix Sigma to make the non-diagonal structure immediate.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The referee's summary accurately captures the scope and contributions of the work on exact recovery in dependent Gaussian mixture models, including the handling of singular covariances and the no-gap mechanism.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims concern sufficient and necessary conditions for exact recovery of the MLE in a Gaussian mixture model, expressed in terms of the model-defined separation measure L_Sigma(x,y) and local comparison inequalities. These quantities are constructed directly from the assumed data-generating process (the fixed covariance Sigma and community means) rather than being fitted to the recovery event or derived from prior self-citations that themselves reduce to the target result. No equations in the provided abstract reduce a claimed threshold to a tautology by construction, and the derivation program aligns with standard exact-recovery analysis without invoking load-bearing self-citations or ansatzes smuggled from the authors' prior work. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling assumption that the data are generated from a Gaussian mixture whose covariance is exactly the given matrix Sigma, together with the definition of the whitened separation L_Sigma; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Observations follow a Gaussian mixture model with fixed covariance matrix Sigma (possibly singular)
    Explicitly stated as the model under study in the abstract.

pith-pipeline@v0.9.0 · 5722 in / 1229 out tokens · 26977 ms · 2026-05-24T10:46:45.828657+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

  1. [1]

    E. Abbe. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18:1–86, 2018

  2. [2]

    Abbe and C

    E. Abbe and C. Sandon. Community detection in general stochastic block models:fundamental limits and efficient recovery algorithms. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688, 2015

  3. [3]

    Bandeira, and Georgina Hall

    Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory , 62:471–487, 2016

  4. [4]

    Bandeira, Nicolas Boumal, and Vladislav Voroninski

    Afonso S. Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach for semi- definite programs arising in synchronization and community detection, 2016

  5. [5]

    Berthet, P

    Q. Berthet, P. Rigollet, and P. Srivastava. Exact recovery in the ising block model.Annals of Statistics , 47:1805–1834, 2019

  6. [6]

    Bunea, C

    F. Bunea, C. Giraud, X. Luo, M. Royer, and N. Verzelen. Model assisted variable clustering: Minimax- optimalrecovery and algorithms. Annals of Statistics , 48:117–137, 2020

  7. [7]

    PECOK: a convex optimization approach to variable clustering

    F. Bunea, C. Giraud, M. Royer, and N. Verzelen. Pecok: a convex optimization approach to variable clustering. 2016. arXiv:1606.05100

  8. [8]

    Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

    Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´ a. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E , 84(6), Dec 2011. COMMUNITY DETECTION IN DEPENDENT GAUSSIAN MIXTURE MODEL 35

  9. [9]

    Dempster, N.M

    A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B , 39:1–38, 1977

  10. [10]

    M. E. Dyer and A. M. Frieze. The solution of some random np-hard problems in polynomial expected time. Journal of Algorithms , 10:451–489, 1989

  11. [11]

    Fraley and A

    C. Fraley and A. Raftery. Model-based clustering, discriminant analysis, and density estimation. Jour- nal of American Statistical Association , 97:611–631, 2002

  12. [12]

    Hartigan

    J.A. Hartigan. Bounding the maximum of dependent random variables. Electronic Journal of Statistics , 8:3126–3140, 2014

  13. [13]

    P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5:109–137, 1983

  14. [14]

    M. James. The generalized inverse. The Mathematical Gazette , 62:109–114, 1978

  15. [15]

    Krzakala, C

    F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborova, and P. Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences , 110(52):20935–20940, Nov 2013

  16. [16]

    Z. Li. Exact recovery of community detection in k-community gaussian mixture models. 2020. https: //arxiv.org/abs/2009.01185

  17. [17]

    Z. Li. Exact recovery of community detection in k-partite graph models. 2020. arXiv:1910.04320

  18. [18]

    Massouli´ e

    L. Massouli´ e. Community detection thresholds and the weak Ramanujan property. Proceedings of the 46th Annual ACM Symposium on Theory of Computing , pages 694–703, 2014

  19. [19]

    J.B. McQueen. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability , pages 281–297, 1967

  20. [20]

    Semidefinite programs on sparse random graphs and their application to community detection, 2015

    Andrea Montanari and Subhabrata Sen. Semidefinite programs on sparse random graphs and their application to community detection, 2015

  21. [21]

    A proof of the blockmodel threshold conjecture

    Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the blockmodel threshold conjecture. Com- binatorica, 38:665–708, 2018

  22. [22]

    Peng and Y

    J. Peng and Y. Wei. Approximating k-means-type clustering via semidefinite program-ming. SIAM J. on Optimization , 18:186–205, 2015

  23. [23]

    D. Pollard. Strong consistency of k-means clustering. Ann. Statist. , pages 135–140, 1981

  24. [24]

    Pringle and A.A

    R.M. Pringle and A.A. Rayner. Generalized Inverse Matrices with Applications to Statistics . Griffin, 1971. Department of Mathematics, University of Connecticut, Storrs, Connecticut 06269-3009, USA Email address : zhongyang.li@uconn.edu URL: https://mathzhongyangli.wordpress.com Department of Applied Mathematics and Statistics, Johns Hopkins University, Bal...