Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

Anna Choromanska; Apoorv Agarwal; Krzysztof Choromanski

arxiv: 1410.6975 · v1 · pith:ZZSP3O7Qnew · submitted 2014-10-26 · 💻 cs.LG

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

Apoorv Agarwal , Anna Choromanska , Krzysztof Choromanski This is my paper

classification 💻 cs.LG

keywords kmeanskmeansdclusteringkmeansrandalgorithmalgorithmsbetterdeterminantal

0 comments

read the original abstract

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++. Both KMEANSRAND and KMEANS++ have a major that the value of k needs to be set by the user of the algorithms. (Kang 2013) recently proposed a novel use of determinantal point processes for sampling the initial centroids for the KMEANS algorithm (we call it KMEANSD++). They, however, do not provide any evaluation establishing that KMEANSD++ is better than other algorithms. In this paper, we show that the performance of KMEANSD++ is comparable to KMEANS++ (both of which are better than KMEANSRAND) with KMEANSD++ having an additional that it can automatically approximate the value of k.

This paper has not been read by Pith yet.

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

discussion (0)