Adaptively Sparse Transformers

Gonçalo M. Correia, Vlad Niculae, André F. T. Martins · 2019 · Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) · DOI 10.18653/v1/d19-1223

4 Pith papers cite this work, alongside 92 external citations. Polarity classification is still indexing.

4 Pith papers citing it

92 external citations · Crossref

open at publisher browse 4 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

EntmaxKV: Support-Aware Decoding for Entmax Attention

cs.LG · 2026-05-20 · conditional · novelty 8.0

EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.

Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis

eess.SP · 2026-05-16 · unverdicted · novelty 6.0

Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.

Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

SPIN co-designs sparse attention with hierarchical memory to achieve 1.66-5.66x higher throughput, 7-9x lower TTFT, and up to 58% lower TPOT than vLLM and original sparse implementations.

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference

cs.LG · 2025-05-05 · conditional · novelty 6.0

RetroInfer introduces the wave index and wave buffer to realize sparse KV-cache attention for long-context LLM inference with up to 4.4X throughput gains while matching full-attention accuracy.

citing papers explorer

Showing 4 of 4 citing papers.

EntmaxKV: Support-Aware Decoding for Entmax Attention cs.LG · 2026-05-20 · conditional · none · ref 3
EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.
Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis eess.SP · 2026-05-16 · unverdicted · none · ref 98
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.
Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving cs.LG · 2026-04-29 · unverdicted · none · ref 16
SPIN co-designs sparse attention with hierarchical memory to achieve 1.66-5.66x higher throughput, 7-9x lower TTFT, and up to 58% lower TPOT than vLLM and original sparse implementations.
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference cs.LG · 2025-05-05 · conditional · none · ref 18
RetroInfer introduces the wave index and wave buffer to realize sparse KV-cache attention for long-context LLM inference with up to 4.4X throughput gains while matching full-attention accuracy.

Adaptively Sparse Transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer