Efficient streaming language models with attention sinks

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

cs.CV · 2025-05-24 · unverdicted · novelty 7.0

SVG2 accelerates DiT video generation via semantic-aware token permutation with k-means, achieving up to 2.3x speedup and PSNR of 30 while fixing position-based clustering and scattered-token waste.

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.

citing papers explorer

Showing 2 of 2 citing papers.

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation cs.CV · 2025-05-24 · unverdicted · none · ref 7
SVG2 accelerates DiT video generation via semantic-aware token permutation with k-means, achieving up to 2.3x speedup and PSNR of 30 while fixing position-based clustering and scattered-token waste.
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers cs.CV · 2026-04-23 · unverdicted · none · ref 46
Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.

Efficient streaming language models with attention sinks

fields

years

verdicts

representative citing papers

citing papers explorer