pith. sign in

arxiv: 1708.06834 · v3 · pith:J7TOYVMKnew · submitted 2017-08-22 · 💻 cs.AI · cs.CV

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

classification 💻 cs.AI cs.CV
keywords skipupdatesmodelstatecomputationalgraphlearninglong
0
0 comments X
read the original abstract

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling

    cs.LG 2026-05 unverdicted novelty 7.0

    LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapt...

  2. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

    cs.CL 2023-10 conditional novelty 6.0

    FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention hea...

  3. ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network

    cs.LG 2019-06 unverdicted novelty 5.0

    ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.