Introduces fail-closed lowering semantics for Resident KV Claims in LLM serving runtimes, along with a conformance checker, descriptor format, and classification of existing systems.
KVCache cache in the wild: Characterizing and optimizing KVCache cache at a large cloud provider.arXiv preprint arXiv:2506.02634, 2025a
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Presents a recency/frequency adaptive KV caching approach that achieves up to 10.8% higher hit rate and 12.6% lower TTFT compared to vLLM on synthetic workloads.
citing papers explorer
-
Fail-Closed Lowering of Resident KV Claims onto LLM Serving Runtimes
Introduces fail-closed lowering semantics for Resident KV Claims in LLM serving runtimes, along with a conformance checker, descriptor format, and classification of existing systems.
-
Recency/Frequency Adaptive KV Caching for Large Language Model Serving
Presents a recency/frequency adaptive KV caching approach that achieves up to 10.8% higher hit rate and 12.6% lower TTFT compared to vLLM on synthetic workloads.