On weight initialization in deep neural networks

Siddharth Krishna Kumar · 2017 · cs.LG · arXiv 1704.08863

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Efficient Unlearning through Maximizing Relearning Convergence Delay

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

Proposes a modular unified lightweight neural architecture for demosaicing multiple pixel-bin CFAs together with a learning-free CFA identification module.

Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative

cs.CE · 2026-06-02 · unverdicted · novelty 6.0

Standard PINNs fail to recover layer moduli in multilayer pavement systems due to domain discontinuities; XPINN improves but remains sensitive to weighting and noise; DiffFEM yields superior accuracy, stability, and efficiency on the same synthetic benchmark.

Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

quant-ph · 2026-06-08 · unverdicted · novelty 5.0

Derives lower bound on gradient variance and probabilistic concentration bounds for Gaussian-initialized IQP QCBMs trained via MMD loss.

Embedding Non-Distortive Cancelable Face Template Generation

cs.CV · 2024-02-04 · unverdicted · novelty 3.0

Presents a non-distortive cancelable face template method via targeted image distortion that maintains identity signals for neural embedding models on MNIST and LFW data.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Efficient Unlearning through Maximizing Relearning Convergence Delay cs.LG · 2026-04-10 · unverdicted · none · ref 34
The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.
An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors cs.CV · 2026-06-11 · unverdicted · none · ref 33 · internal anchor
Proposes a modular unified lightweight neural architecture for demosaicing multiple pixel-bin CFAs together with a learning-free CFA identification module.
Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative cs.CE · 2026-06-02 · unverdicted · none · ref 24 · internal anchor
Standard PINNs fail to recover layer moduli in multilayer pavement systems due to domain discontinuities; XPINN improves but remains sensitive to weighting and noise; DiffFEM yields superior accuracy, stability, and efficiency on the same synthetic benchmark.
Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization quant-ph · 2026-06-08 · unverdicted · none · ref 18 · internal anchor
Derives lower bound on gradient variance and probabilistic concentration bounds for Gaussian-initialized IQP QCBMs trained via MMD loss.
Embedding Non-Distortive Cancelable Face Template Generation cs.CV · 2024-02-04 · unverdicted · none · ref 12 · internal anchor
Presents a non-distortive cancelable face template method via targeted image distortion that maintains identity signals for neural embedding models on MNIST and LFW data.

On weight initialization in deep neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer