Big-Data Clustering: K-Means or K-Indicators?

Feiyu Chen; Liwei Xu; Taiping Zhang; Yin Zhang; Yuchen Yang

arxiv: 1906.00938 · v1 · pith:UONHQXDTnew · submitted 2019-06-03 · 💻 cs.LG · cs.CV· math.OC· stat.ML

Big-Data Clustering: K-Means or K-Indicators?

Feiyu Chen , Yuchen Yang , Liwei Xu , Taiping Zhang , Yin Zhang This is my paper

classification 💻 cs.LG cs.CVmath.OCstat.ML

keywords algorithmk-meansclusteringnumberdatainitializationsk-indicatorslarge

0 comments

read the original abstract

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a scalability bottleneck with respect to the number of clusters K as this number grows in big data applications. In this work, we promote a closely related model called K-indicators model and construct an efficient, semi-convex-relaxation algorithm that requires no randomized initializations. We present extensive empirical results to show advantages of the new algorithm when K is large. In particular, using the new algorithm to start the K-means algorithm, without any replication, can significantly outperform the standard K-means with a large number of currently state-of-the-art random replications.

This paper has not been read by Pith yet.

Big-Data Clustering: K-Means or K-Indicators?

discussion (0)