Are All Training Examples Created Equal? An Empirical Study
read the original abstract
Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subsample is indeed sufficient for training. For other datasets, however, the relative differences in importance are negligible. These results have important implications for active learning on deep networks. Additionally, our analysis method can be used as a general tool to better understand diversity of training examples in datasets.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision
Two data selection techniques (GMM visual similarity and bounding-box diversity) reduce required weakly labeled images by up to 100x on Open Images and 20x on Cityscapes while maintaining semantic segmentation performance.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.