A penalized criterion for selecting the number of clusters for K-medians

Antoine Godichon-Baggioni (LPSM (UMR\_8001)); Sobihan Surendran (LPSM (UMR\_8001))

arxiv: 2209.03597 · v3 · pith:FGGLXRS2new · submitted 2022-09-08 · 🧮 math.ST · stat.TH

A penalized criterion for selecting the number of clusters for K-medians

Antoine Godichon-Baggioni (LPSM (UMR\_8001)) , Sobihan Surendran (LPSM (UMR\_8001)) This is my paper

classification 🧮 math.ST stat.TH

keywords clusteringclustersk-mediansnumberalgorithmscriteriondatahere

0 comments

read the original abstract

Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be preferred to K-means because of its robustness. More precisely, we concentrate on a common question in clustering: how to chose the number of clusters? The answer proposed here is to consider the choice of the optimal number of clusters as the minimization of a risk function via penalization. In this paper, we obtain a suitable penalty shape for our criterion and derive an associated oracle-type inequality. Finally, the performance of this approach with different types of K-medians algorithms is compared on a simulation study with other popular techniques. All studied algorithms are available in the R package Kmedians on CRAN.

This paper has not been read by Pith yet.

A penalized criterion for selecting the number of clusters for K-medians

discussion (0)