PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
Evicpress: Joint kv-cache compression and eviction for efficient llm serving
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.
citing papers explorer
-
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
-
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.