pith. sign in

Llms know what to drop: Self-attention guided kv cache eviction for efficient long-context inference.arXiv preprint arXiv:2503.08879

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 5 2025 1

verdicts

UNVERDICTED 6

roles

background 1

polarities

background 1

representative citing papers

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

cs.AI · 2025-11-05 · unverdicted · novelty 7.0

SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.

citing papers explorer

Showing 6 of 6 citing papers.