pith. sign in

Not all heads matter: A head-level KV cache compression method with integrated retrieval and reasoning

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 1 baseline 1

citation-polarity summary

clear filters

representative citing papers

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

cs.DC · 2026-04-18 · unverdicted · novelty 5.0

HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • HieraSparse: Hierarchical Semi-Structured Sparse KV Attention cs.DC · 2026-04-18 · unverdicted · none · ref 24

    HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and