SnapKV: LLM knows what you are looking for before generation

Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.

citing papers explorer

Showing 1 of 1 citing paper.

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning cs.CL · 2026-05-10 · unverdicted · none · ref 10
A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.

SnapKV: LLM knows what you are looking for before generation

fields

years

verdicts

representative citing papers

citing papers explorer