Hysparse: A hybrid sparse attention architecture with oracle token selection and kv cache sharing.arXiv preprint arXiv:2602.03560, 2026

· 2026 · arXiv 2602.03560

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

CLSA shares both KV cache and routing indices across decoder layers to amortize top-k selection, delivering up to 7.6x decoding speedup and 17.1x throughput at 128K context while preserving accuracy.

SinkRec: Mitigating Semantic State Sink in Long Sequence Recommendation with Memory-Conditioned Gated Delta Networks

cs.LG · 2026-06-03 · unverdicted · novelty 5.0

SinkRec proposes a memory-conditioned architecture with TDGD to mitigate semantic state sink in linear attention for long-sequence recommendation.

SSV: Sparse Speculative Verification for Efficient LLM Inference

cs.OS · 2026-05-19 · unverdicted · novelty 5.0

SSV presents a sparse speculative-verification framework that resolves mismatches between speculative decoding and dynamic sparse attention to deliver up to 3.49x end-to-end throughput and 6.86x kernel speedups on NVIDIA H100 GPUs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SSV: Sparse Speculative Verification for Efficient LLM Inference cs.OS · 2026-05-19 · unverdicted · none · ref 14
SSV presents a sparse speculative-verification framework that resolves mismatches between speculative decoding and dynamic sparse attention to deliver up to 3.49x end-to-end throughput and 6.86x kernel speedups on NVIDIA H100 GPUs.

Hysparse: A hybrid sparse attention architecture with oracle token selection and kv cache sharing.arXiv preprint arXiv:2602.03560, 2026

fields

years

verdicts

representative citing papers

citing papers explorer