Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Brendan Jou; Jordi Torres; Shih-Fu Chang; Victor Campos; Xavier Giro-i-Nieto

arxiv: 1708.06834 · v3 · pith:J7TOYVMKnew · submitted 2017-08-22 · 💻 cs.AI · cs.CV

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Victor Campos , Brendan Jou , Xavier Giro-i-Nieto , Jordi Torres , Shih-Fu Chang This is my paper

classification 💻 cs.AI cs.CV

keywords skipupdatesmodelstatecomputationalgraphlearninglong

0 comments

read the original abstract

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling
cs.LG 2026-05 unverdicted novelty 7.0

LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapt...
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
cs.CL 2023-10 conditional novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention hea...
ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network
cs.LG 2019-06 unverdicted novelty 5.0

ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.