pith. machine review for the scientific record. sign in

arxiv: 1711.02085 · v3 · submitted 2017-11-06 · 💻 cs.CL

Recognition: unknown

Neural Speed Reading via Skim-RNN

Authors on Pith no claims yet
classification 💻 cs.CL
keywords skim-rnnrnnsspeedstandardaccuracycomparedcomputationaldynamically
0
0 comments X
read the original abstract

Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens. Skim-RNN gives computational advantage over an RNN that always updates the entire hidden state. Skim-RNN uses the same input and output interfaces as a standard RNN and can be easily used instead of RNNs in existing models. In our experiments, we show that Skim-RNN can achieve significantly reduced computational cost without losing accuracy compared to standard RNNs across five different natural language tasks. In addition, we demonstrate that the trade-off between accuracy and speed of Skim-RNN can be dynamically controlled during inference time in a stable manner. Our analysis also shows that Skim-RNN running on a single CPU offers lower latency compared to standard RNNs on GPUs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

    cs.CL 2023-10 conditional novelty 6.0

    FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention hea...