Understanding data augmentation for classification: when to warp?

Adam Gatt; Mark D. McDonnell; Sebastien C. Wong; Victor Stamatescu

arxiv: 1609.08764 · v2 · pith:ZCIU2UL5new · submitted 2016-09-28 · 💻 cs.CV

Understanding data augmentation for classification: when to warp?

Sebastien C. Wong , Adam Gatt , Victor Stamatescu , Mark D. McDonnell This is my paper

classification 💻 cs.CV

keywords dataaugmentationsamplesadditionalconvolutionalmachinebenefitclassifier

0 comments

read the original abstract

In this paper we investigate the benefit of augmenting data with synthetically created samples when training a machine learning classifier. Two approaches for creating additional training samples are data warping, which generates additional samples through transformations applied in the data-space, and synthetic over-sampling, which creates additional samples in feature-space. We experimentally evaluate the benefits of data augmentation for a convolutional backpropagation-trained neural network, a convolutional support vector machine and a convolutional extreme learning machine classifier, using the standard MNIST handwritten digit dataset. We found that while it is possible to perform generic augmentation in feature-space, if plausible transforms for the data are known then augmentation in data-space provides a greater benefit for improving performance and reducing overfitting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics
cs.SI 2019-06 unverdicted novelty 5.0

Multi-modal analysis of 994 Weibo posts and 18,966 images finds sentiment as the sole consistent predictor of censorship, with anti-government topics deleted more often and average deletion time of three hours.