A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
Cache what lasts: Token retention for memory-bounded kv cache in llms.arXiv preprint arXiv:2512.03324
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 3years
2026 3roles
background 1polarities
background 1representative citing papers
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
citing papers explorer
-
Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
-
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
- SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference