Recognition: unknown
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
read the original abstract
Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
-
From Cortical Synchronous Rhythm to Brain Inspired Learning Mechanism: An Oscillatory Spiking Neural Network with Time-Delayed Coordination
S2-Net is an oscillatory spiking neural network that uses time-delayed synchronization for bottom-up and top-down coordination to enable efficient, brain-inspired information processing across tasks like decoding and ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.