Cache what lasts: Token retention for memory-bounded kv cache in llms.arXiv preprint arXiv:2512.03324

Cache what lasts: T oken retention for memory-bounded KV cache in LLMs · 2025 · arXiv 2512.03324

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

cs.LG · 2026-05-18 · unverdicted · novelty 4.0

Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.

SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

cs.LG · 2026-05-13

citing papers explorer

Showing 3 of 3 citing papers.

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction cs.LG · 2026-05-10 · unverdicted · none · ref 2
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction cs.LG · 2026-05-18 · unverdicted · none · ref 9
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference cs.LG · 2026-05-13 · unreviewed · ref 2

Cache what lasts: Token retention for memory-bounded kv cache in llms.arXiv preprint arXiv:2512.03324

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer