PECOK: a convex optimization approach to variable clustering

Christophe Giraud; Florentina Bunea; Martin Royer; Nicolas Verzelen

arxiv: 1606.05100 · v1 · pith:M7H655S4new · submitted 2016-06-16 · 🧮 math.ST · stat.TH

PECOK: a convex optimization approach to variable clustering

Florentina Bunea , Christophe Giraud , Martin Royer , Nicolas Verzelen This is my paper

classification 🧮 math.ST stat.TH

keywords clusterpecokclusteringclustersgroupslatentperfectrecovery

0 comments

read the original abstract

The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X=(X_{1},\ldots,X_{p})$, and estimating these groups from $n$ independent copies of $X$. When cluster similarity is defined via $G$-latent models, in which groups of $X$-variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$, the most natural clustering strategy is $K$-means. We explain why this strategy cannot lead to perfect cluster recovery and offer a correction, based on semi-definite programing, that can be viewed as a penalized convex relaxation of $K$-means (PECOK). We introduce a cluster separation measure tailored to $G$-latent models, and derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover $G$ at a near minimax optimal cluster separation rate, a result that holds true even if $K$, the number of clusters, is estimated adaptively from the data. We compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for perfect cluster recovery of minimally separated clusters.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Exact Recovery of Community Detection in dependent Gaussian Mixture Models
math.ST 2022-09 unverdicted novelty 7.0

Sufficient conditions and sharp thresholds are given for exact recovery via MLE in dependent Gaussian mixture models for community detection, including singular covariances.