Evicpress: Joint kv-cache compression and eviction for efficient llm serving

Shaoting Feng, Yuhan Liu, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, et al · 2025 · arXiv 2512.14946

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

cs.LG · 2026-04-27 · conditional · novelty 6.0

A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

citing papers explorer

Showing 2 of 2 citing papers.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents cs.AI · 2026-05-19 · unverdicted · none · ref 14
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference cs.LG · 2026-04-27 · conditional · none · ref 12
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

Evicpress: Joint kv-cache compression and eviction for efficient llm serving

fields

years

verdicts

representative citing papers

citing papers explorer