Epic: Efficient position-independent context caching for serving large language models

Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Yuan, Tiancheng Xie, Zhixia Liu, Xusheng Liu, Tao Cui, Fei Liu, Yizhou Cao · 2024 · arXiv 2410.15332

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 2

representative citing papers

CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

cs.PL · 2026-06-11 · unverdicted · novelty 6.0

FCGraft synthesizes code policies for embodied agents by grafting KV caches from a library of validated functions, claiming 18.31% higher success rate and 2.3x faster synthesis than prompt-level caching.

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

GroundedCache reduces unsafe-served rate in RAG answer caching to 0-1.5% (vs 15-51.5% naive) via four validation gates while keeping p50 latency within 1.07x of no-cache baseline.

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

cs.DC · 2026-04-03 · unverdicted · novelty 6.0

TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.

MiniPIC: Flexible Position-Independent Caching in <100LOC

cs.LG · 2026-06-11 · unverdicted · novelty 5.0

MiniPIC enables multiple position-independent caching methods inside vLLM via unrotated KV storage, per-request RoPE application, and three primitives, delivering 49% prefill throughput gains and up to 100x lower cached-span TTFT on 2WikiMultihopQA.

From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs

cs.IR · 2025-04-22 · unverdicted · novelty 5.0

The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.

citing papers explorer

Showing 5 of 5 citing papers after filters.

CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference cs.CR · 2026-05-22 · unverdicted · none · ref 17
CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.
Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents cs.PL · 2026-06-11 · unverdicted · none · ref 10
FCGraft synthesizes code policies for embodied agents by grafting KV caches from a library of validated functions, claiming 18.31% higher success rate and 2.3x faster synthesis than prompt-level caching.
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer? cs.CR · 2026-05-26 · unverdicted · none · ref 6
GroundedCache reduces unsafe-served rate in RAG answer caching to 0-1.5% (vs 15-51.5% naive) via four validation gates while keeping p50 latency within 1.07x of no-cache baseline.
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing cs.DC · 2026-04-03 · unverdicted · none · ref 13
TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.
MiniPIC: Flexible Position-Independent Caching in <100LOC cs.LG · 2026-06-11 · unverdicted · none · ref 3
MiniPIC enables multiple position-independent caching methods inside vLLM via unrotated KV storage, per-request RoPE application, and three primitives, delivering 49% prefill throughput gains and up to 100x lower cached-span TTFT on 2WikiMultihopQA.

Epic: Efficient position-independent context caching for serving large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer