Evicpress: Joint kv-cache compression and eviction for efficient llm serving

Shaoting Feng, Yuhan Liu, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, et al · 2025 · arXiv 2512.14946

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

A Policy-Driven Runtime Layer for Agentic LLM Serving

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

Introduces a three-tier architecture with an agent runtime layer and four primitives for agent-aware policies in LLM serving, validated on KV caching via CacheSage showing 13-37pp hit-rate gains on five workloads.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

cs.LG · 2026-04-27 · conditional · novelty 6.0

A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference cs.LG · 2026-04-27 · conditional · none · ref 12
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

Evicpress: Joint kv-cache compression and eviction for efficient llm serving

fields

years

verdicts

representative citing papers

citing papers explorer