pith. sign in

On weight initialization in deep neural networks.ArXiv, abs/1704.08863

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 1 cs.LG 1

years

2026 1 2024 1

verdicts

UNVERDICTED 2

roles

background 1

polarities

background 1

representative citing papers

citing papers explorer

Showing 2 of 2 citing papers.