pith. sign in

arxiv: 1410.6975 · v1 · pith:ZZSP3O7Qnew · submitted 2014-10-26 · 💻 cs.LG

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

classification 💻 cs.LG
keywords kmeanskmeansdclusteringkmeansrandalgorithmalgorithmsbetterdeterminantal
0
0 comments X
read the original abstract

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++. Both KMEANSRAND and KMEANS++ have a major that the value of k needs to be set by the user of the algorithms. (Kang 2013) recently proposed a novel use of determinantal point processes for sampling the initial centroids for the KMEANS algorithm (we call it KMEANSD++). They, however, do not provide any evaluation establishing that KMEANSD++ is better than other algorithms. In this paper, we show that the performance of KMEANSD++ is comparable to KMEANS++ (both of which are better than KMEANSRAND) with KMEANSD++ having an additional that it can automatically approximate the value of k.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.