pith. sign in

arxiv: 1811.12569 · v1 · pith:QCLFROAXnew · submitted 2018-11-30 · 💻 cs.LG · cs.CV· stat.ML

Are All Training Examples Created Equal? An Empirical Study

classification 💻 cs.LG cs.CVstat.ML
keywords trainingdatasetsimportanceexampleshoweverrelativesubsamplesufficient
0
0 comments X
read the original abstract

Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subsample is indeed sufficient for training. For other datasets, however, the relative differences in importance are negligible. These results have important implications for active learning on deep networks. Additionally, our analysis method can be used as a general tool to better understand diversity of training examples in datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision

    cs.CV 2019-07 unverdicted novelty 5.0

    Two data selection techniques (GMM visual similarity and bounding-box diversity) reduce required weakly labeled images by up to 100x on Open Images and 20x on Cityscapes while maintaining semantic segmentation performance.