Norm matters: efficient and accurate normalization schemes in deep networks

arxiv: 1803.01814 · v3 · pith:FOVFFVOXnew · submitted 2018-03-05 · 📊 stat.ML · cs.LG

Norm matters: efficient and accurate normalization schemes in deep networks

Elad Hoffer , Ron Banner , Itay Golan , Daniel Soudry This is my paper

classification 📊 stat.ML cs.LG

keywords normalizationbatch-normdeepimplementationsmethodsnetworksnormperformance

0 comments p. Extension

pith:FOVFFVOX Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{FOVFFVOX}

Prints a linked pith:FOVFFVOX badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$ and $L^\infty$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations. Finally, we suggest a modification to weight-normalization, which improves its performance on large-scale tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Root Mean Square Layer Normalization
cs.LG 2019-10 conditional novelty 5.0

RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.