Understanding Batch Normalization

Bart Selman; Carla Gomes; Johan Bjorck; Kilian Q. Weinberger

arxiv: 1806.02375 · v4 · pith:EEPG3L2Mnew · submitted 2018-06-01 · 💻 cs.LG · cs.AI· stat.ML

Understanding Batch Normalization

Johan Bjorck , Carla Gomes , Bart Selman , Kilian Q. Weinberger This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords activationsdeeplearningnetworksbatchbetterconvergenceenables

0 comments

read the original abstract

Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a better understanding of BN, following an empirical approach. We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. For networks without BN we demonstrate how large gradient updates can result in diverging loss and activations growing uncontrollably with network depth, which limits possible learning rates. BN avoids this problem by constantly correcting activations to be zero-mean and of unit standard deviation, which enables larger gradient steps, yields faster convergence and may help bypass sharp local minima. We further show various ways in which gradients and activations of deep unnormalized networks are ill-behaved. We contrast our results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deep Learning for CMB Foreground Removal and Beam Deconvolution: A U-Net GAN Approach
astro-ph.IM 2025-08 unverdicted novelty 7.0

A U-Net GAN reconstructs CMB T and E maps from Planck-like simulations with foregrounds and systematics, achieving under 1% error outside the Galactic region and demonstrating first-time correction for non-circular be...
QUOTIENT: Two-Party Secure Neural Network Training and Prediction
cs.CR 2019-07 unverdicted novelty 6.0

QUOTIENT achieves 50X faster WAN training time and 6% higher absolute accuracy for secure two-party DNN training by jointly optimizing a discretized training algorithm with a tailored secure protocol.
Signal Conditioning for Learning in the Wild
cs.NE 2019-07 unverdicted novelty 5.0

Olfactory-inspired signal conditioning regularizes diverse inputs so a single brain-mimetic network performs classification across gas sensing, remote sensing, and species identification without hyperparameter changes.
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
cs.LG 2019-07 unverdicted novelty 4.0

Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.