A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V Le, Navdeep Jaitly, Geoffrey E Hinton · 2015 · cs.NE · arXiv 1504.00941

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

representative citing papers

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

cs.LG · 2026-03-15 · unverdicted · novelty 6.0

M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.

R-Transformer: Recurrent Neural Network Enhanced Transformer

cs.LG · 2019-07-12 · unverdicted · novelty 6.0

R-Transformer integrates RNNs with multi-head attention to model local and global sequence dependencies without position embeddings and reports large-margin gains over prior methods on diverse tasks.

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

cs.LG · 2019-07-01 · unverdicted · novelty 6.0

A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.

Geometric Analysis of Variational Quantum Eigensolver

quant-ph · 2026-05-27 · unverdicted · novelty 5.0

Unifies fixed-ansatz and adaptive VQE via ansatz-free product-unitary formulation on the unitary group and derives convergence rates, initialization guarantees, and noise-robust measurement strategies for Riemannian gradient descent.

From Cortical Synchronous Rhythm to Brain Inspired Learning Mechanism: An Oscillatory Spiking Neural Network with Time-Delayed Coordination

q-bio.NC · 2026-05-03 · unverdicted · novelty 5.0

S2-Net is an oscillatory spiking neural network that uses time-delayed synchronization for bottom-up and top-down coordination to enable efficient, brain-inspired information processing across tasks like decoding and reasoning.

SAFE Quantum Machine Learning with Variational Quantum Classifiers

cs.LG · 2026-05-15 · unverdicted · novelty 3.0

A variational quantum classifier with normalized amplitude embeddings and bounded observables achieves competitive accuracy with improved robustness and stability over classical baselines in safety-critical settings.

citing papers explorer

Showing 4 of 4 citing papers after filters.

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 17 · internal anchor
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
R-Transformer: Recurrent Neural Network Enhanced Transformer cs.LG · 2019-07-12 · unverdicted · none · ref 13 · internal anchor
R-Transformer integrates RNNs with multi-head attention to model local and global sequence dependencies without position embeddings and reports large-margin gains over prior methods on diverse tasks.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning cs.LG · 2019-07-01 · unverdicted · none · ref 55 · internal anchor
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
SAFE Quantum Machine Learning with Variational Quantum Classifiers cs.LG · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
A variational quantum classifier with normalized amplitude embeddings and bounded observables achieves competitive accuracy with improved robustness and stability over classical baselines in safety-critical settings.

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

fields

years

verdicts

representative citing papers

citing papers explorer