arxiv: 1202.5933 · v1 · pith:MC744VBBnew · submitted 2012-02-27 · 📊 stat.AP

Prototype selection for interpretable classification

Jacob Bien , Robert Tibshirani This is my paper

classification 📊 stat.AP

keywords prototypesamplesdatamethodclassificationclassifierdiscussmethods

0 comments

read the original abstract

Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of "representative" samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variables methods, this paper aims at achieving sparsity in the samples. We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means toward building an efficient classifier, in this paper we emphasize the inherent value of having a set of prototypical elements. That said, by using the nearest-neighbor rule on the set of prototypes, we can of course discuss our method as a classifier as well.

This paper has not been read by Pith yet.

Prototype selection for interpretable classification

discussion (0)