PECOK: a convex optimization approach to variable clustering
read the original abstract
The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X=(X_{1},\ldots,X_{p})$, and estimating these groups from $n$ independent copies of $X$. When cluster similarity is defined via $G$-latent models, in which groups of $X$-variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$, the most natural clustering strategy is $K$-means. We explain why this strategy cannot lead to perfect cluster recovery and offer a correction, based on semi-definite programing, that can be viewed as a penalized convex relaxation of $K$-means (PECOK). We introduce a cluster separation measure tailored to $G$-latent models, and derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover $G$ at a near minimax optimal cluster separation rate, a result that holds true even if $K$, the number of clusters, is estimated adaptively from the data. We compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for perfect cluster recovery of minimally separated clusters.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Exact Recovery of Community Detection in dependent Gaussian Mixture Models
Sufficient conditions and sharp thresholds are given for exact recovery via MLE in dependent Gaussian mixture models for community detection, including singular covariances.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.