CLSA shares both KV cache and routing indices across decoder layers to amortize top-k selection, delivering up to 7.6x decoding speedup and 17.1x throughput at 128K context while preserving accuracy.
Hysparse: A hybrid sparse attention architecture with oracle token selection and kv cache sharing.arXiv preprint arXiv:2602.03560, 2026
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SinkRec proposes a memory-conditioned architecture with TDGD to mitigate semantic state sink in linear attention for long-sequence recommendation.
SSV presents a sparse speculative-verification framework that resolves mismatches between speculative decoding and dynamic sparse attention to deliver up to 3.49x end-to-end throughput and 6.86x kernel speedups on NVIDIA H100 GPUs.
citing papers explorer
-
SSV: Sparse Speculative Verification for Efficient LLM Inference
SSV presents a sparse speculative-verification framework that resolves mismatches between speculative decoding and dynamic sparse attention to deliver up to 3.49x end-to-end throughput and 6.86x kernel speedups on NVIDIA H100 GPUs.