Lethe: Layer- and time-adaptive kv cache pruning for reasoning-intensive llm serving.arXiv preprint arXiv:2511.06029, 2025

Hui Zeng, Daming Zhao, Pengfei Yang, WenXuan Hou, Tianyang Zheng, Hui Li, Weiye Ji, Jidong Zhai · 2025 · arXiv 2511.06029

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Adaptive Mass-Segmented KV Compression for Long-Context Reasoning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

AMS KV compression adaptively partitions the cache by attention mass regions and assigns quotas to protect contiguous reasoning blocks during long-context LLM inference.

citing papers explorer

Showing 1 of 1 citing paper.

Adaptive Mass-Segmented KV Compression for Long-Context Reasoning cs.LG · 2026-05-22 · unverdicted · none · ref 40
AMS KV compression adaptively partitions the cache by attention mass regions and assigns quotas to protect contiguous reasoning blocks during long-context LLM inference.

Lethe: Layer- and time-adaptive kv cache pruning for reasoning-intensive llm serving.arXiv preprint arXiv:2511.06029, 2025

fields

years

verdicts

representative citing papers

citing papers explorer