pith. sign in

arxiv: 1409.2329 · v5 · pith:4QNKC4O7new · submitted 2014-09-08 · 💻 cs.NE

Recurrent Neural Network Regularization

classification 💻 cs.NE
keywords neuraldropoutlstmsnetworksrecurrentregularizationrnnstasks
0
0 comments X
read the original abstract

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    cs.LG 2017-01 accept novelty 8.0

    A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

  2. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...

  3. SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

    cs.HC 2026-05 unverdicted novelty 7.0

    SIGMA-ASL is a multimodal dataset with 93,545 word-level ASL clips from Kinect RGB-D, mmWave radar, and dual IMUs, plus benchmarking protocols for single- and multi-modal recognition.

  4. Augmenting Self-attention with Persistent Memory

    cs.LG 2019-07 unverdicted novelty 7.0

    Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.

  5. Pointer Sentinel Mixture Models

    cs.CL 2016-09 conditional novelty 7.0

    Pointer sentinel-LSTM mixes context copying with softmax prediction to reach 70.9 perplexity on Penn Treebank using fewer parameters than standard LSTMs.

  6. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.

  7. Adversarial Learning for Improved Onsets and Frames Music Transcription

    cs.SD 2019-06 unverdicted novelty 6.0

    Adversarial training on time-frequency representations yields consistent gains in frame-level and note-level accuracy over the Onsets and Frames baseline for automatic music transcription.

  8. Online Supervised Learning for Traffic Load Prediction in Framed-ALOHA Networks

    cs.NI 2019-07 unverdicted novelty 5.0

    LSTM online predictor with MOM-based labeling estimates backlog in framed-ALOHA networks and adapts to changing statistics without prior traffic model knowledge.

  9. Wind Estimation Using Quadcopter Motion: A Machine Learning Approach

    eess.SP 2019-07 unverdicted novelty 5.0

    An LSTM neural network trained on simulated quadcopter states estimates turbulent wind velocities with lower mean and variance errors than a tilt-angle wind triangle method.

  10. Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

    eess.AS 2019-07 unverdicted novelty 4.0

    KLD-based speaker adaptation of seq2seq ASR achieves 25% relative WER reduction, outperforming the 18.7% gain from conventional acoustic model adaptation.