A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V. Le , Navdeep Jaitly , Geoffrey E. Hinton

Authors on Pith no claims yet

classification 💻 cs.NE cs.LG

keywords recurrentnetworkssolutioninitializelinearmatrixproblemrectified

read the original abstract

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
cs.LG 2026-03 unverdicted novelty 6.0

M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
From Cortical Synchronous Rhythm to Brain Inspired Learning Mechanism: An Oscillatory Spiking Neural Network with Time-Delayed Coordination
q-bio.NC 2026-05 unverdicted novelty 5.0

S2-Net is an oscillatory spiking neural network that uses time-delayed synchronization for bottom-up and top-down coordination to enable efficient, brain-inspired information processing across tasks like decoding and ...