Shake-Shake regularization

Xavier Gastaldi

arxiv: 1705.07485 · v2 · pith:CALMY2CInew · submitted 2017-05-21 · 💻 cs.LG · cs.CV

Shake-Shake regularization

Xavier Gastaldi This is my paper

classification 💻 cs.LG cs.CV

keywords shake-shakeregularizationresultsaffineaimsapplicationsappliedarchitectures

0 comments

read the original abstract

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Improved Regularization of Convolutional Neural Networks with Cutout
cs.CV 2017-08 accept novelty 7.0

Randomly masking square regions of input images during CNN training yields new state-of-the-art test errors of 2.56% on CIFAR-10, 15.20% on CIFAR-100, and 1.30% on SVHN.
Sharpness-Aware Minimization for Efficiently Improving Generalization
cs.LG 2020-10 conditional novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
cs.CL 2019-07 unverdicted novelty 6.0

DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
XferNAS: Transfer Neural Architecture Search
cs.LG 2019-07 unverdicted novelty 6.0

XferNAS transfers knowledge across neural architecture searches to reduce search time by a factor of 33 on CIFAR-10/100 while achieving new records of 1.99% and 14.06% error.
Variations on the Chebyshev-Lagrange Activation Function
cs.LG 2019-06 unverdicted novelty 6.0

Chebyshev-Lagrange activations with linear extrapolation match or exceed ReLU/tanh performance in residual networks on image and vector classification tasks.
Confidence Calibration for Convolutional Neural Networks Using Structured Dropout
cs.LG 2019-06 unverdicted novelty 5.0

Structured dropout improves confidence calibration in CNNs by promoting ensemble diversity, with empirical support on SVHN, CIFAR-10, CIFAR-100 and in Bayesian active learning.
Genetic Network Architecture Search
cs.NE 2019-07 unverdicted novelty 3.0

Genetic algorithm searches convolution cell architectures with weight sharing via SGD, reporting 96% accuracy on CIFAR10 and 80.1% on CIFAR100.