pith. sign in

Norm matters: efficient and accurate normalization schemes in deep networks

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$ and $L^\infty$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations. Finally, we suggest a modification to weight-normalization, which improves its performance on large-scale tasks.

fields

cs.LG 1

years

2019 1

verdicts

CONDITIONAL 1

representative citing papers

Root Mean Square Layer Normalization

cs.LG · 2019-10-16 · conditional · novelty 5.0

RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.

citing papers explorer

Showing 1 of 1 citing paper.

  • Root Mean Square Layer Normalization cs.LG · 2019-10-16 · conditional · none · ref 10 · internal anchor

    RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.