A Critical Review of Recurrent Neural Networks for Sequence Learning
read the original abstract
Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video analysis, and musical information retrieval, a model must learn from inputs that are sequences. Interactive tasks, such as translating natural language, engaging in dialogue, and controlling a robot, often demand both capabilities. Recurrent neural networks (RNNs) are connectionist models that capture the dynamics of sequences via cycles in the network of nodes. Unlike standard feedforward neural networks, recurrent networks retain a state that can represent information from an arbitrarily long context window. Although recurrent neural networks have traditionally been difficult to train, and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning with them. In recent years, systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated ground-breaking performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this survey, we review and synthesize the research that over the past three decades first yielded and then made practical these powerful learning models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a self-contained explication of the state of the art together with a historical perspective and references to primary research.
This paper has not been read by Pith yet.
Forward citations
Cited by 12 Pith papers
-
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression
SELF-EMO lets LLMs bootstrap better emotion recognition and expression via self-play, data flywheel filtering with smoothed IoU rewards, and SELF-GRPO reinforcement learning, yielding SOTA gains on IEMOCAP, MELD, and ...
-
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
-
Learning to learn with quantum neural networks via classical neural networks
Classical RNNs trained on small instances provide parameter initializations for QAOA and VQE that reduce total optimization iterations and generalize across problem sizes.
-
A Data-Driven Parametric Reduced-Order Chemical Kinetics Model Derived from Atomistic Simulations
A parametric autoencoder with non-negativity and softmax constraints learns interpretable latent chemical components and couples them to kinetics and heat release for improved reduced-order modeling of decomposition.
-
Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset
LRCM is a new multimodal diffusion model with audio and text Conformers plus Motion Temporal Mamba for generating long, coherent dance sequences from rhythm and descriptions using a decoupled dataset.
-
Interpretable and Steerable Sequence Learning via Prototypes
ProSeNet learns a sparse set of prototypes for case-based explanations in deep sequence models, matches state-of-the-art accuracy on several tasks, and supports manual prototype refinement by non-experts.
-
Selective Correlation Based Knowledge Distillation for Ground Reaction Force Estimation
Selective Correlation Based Knowledge Distillation trains smaller models to accurately estimate ground reaction forces from wearable insole sensors by focusing on temporal features in correlation maps for efficient kn...
-
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...
-
Leveraging Convolutional Sparse Autoencoders for Robust Movement Classification from Low-Density sEMG
Convolutional sparse autoencoder on two-channel sEMG delivers 94.3% multi-subject F1 for six gestures, 92.3% after few-shot transfer to unseen subjects, and 90% after incremental extension to ten classes.
-
Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning
Develops and tests a model-based RL controller with post-training for gait in a tendon-driven soft quadruped, reporting improved efficiency and robustness over benchmarks.
-
EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness
EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.
-
Predicting Drug Responses by Propagating Interactions through Text-Enhanced Drug-Gene Networks
A text-enhanced drug-gene network is constructed from articles and data, with edge embeddings estimated from cell line records to enable explainable drug sensitivity predictions at 94.74% accuracy.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.