Recognition: unknown
On the difficulty of training Recurrent Neural Networks
read the original abstract
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlin...
-
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
-
PaLM: Scaling Language Modeling with Pathways
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
-
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
GNMT deploys 8-layer LSTMs with attention, wordpieces, low-precision inference, and coverage-penalized beam search to match state-of-the-art on WMT'14 En-Fr and En-De while cutting translation errors by 60% in human e...
-
Inferring identified hadron production in $pp$ collisions with physics-informed machine learning at the LHC
A physics-informed neural network infers pT spectra of pi, K, p, Lambda, and Ks in unmeasured rapidity regions from PYTHIA8 pp collisions at 13.6 TeV, achieving 1.5-5.83% yield uncertainties while reproducing yield ra...
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
Preventing overfitting in deep learning using differential privacy
Differential privacy techniques can help prevent overfitting and improve generalization in deep neural networks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.