$k$-POD: A Method for $k$-Means Clustering of Missing Data

Eric C. Chi; Jocelyn T. Chi; Richard G. Baraniuk

arxiv: 1411.7013 · v3 · pith:7FA5GU3Cnew · submitted 2014-11-25 · 📊 stat.CO · stat.ME

k-POD: A Method for k-Means Clustering of Missing Data

Jocelyn T. Chi , Eric C. Chi , Richard G. Baraniuk This is my paper

classification 📊 stat.CO stat.ME

keywords datamissingclusteringmeanswhenapplicationscompletemethod

0 comments

read the original abstract

The $k$-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our $k$-POD method presents a simple extension of $k$-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

This paper has not been read by Pith yet.

k-POD: A Method for k-Means Clustering of Missing Data

discussion (0)