SPIN co-designs sparse attention with hierarchical memory to achieve 1.66-5.66x higher throughput, 7-9x lower TTFT, and up to 58% lower TPOT than vLLM and original sparse implementations.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving
SPIN co-designs sparse attention with hierarchical memory to achieve 1.66-5.66x higher throughput, 7-9x lower TTFT, and up to 58% lower TPOT than vLLM and original sparse implementations.