MEPIC: Memory efficient position independent caching for LLM serving

Qian Wang, Zahra Yousefijamarani, Morgan Lindsay Heisler, Rongzhi Gu, Xiaolong Bai, Yizhou Shan, Wei Zhang, Lan Wang, Ying Xiong, Yong Zhang, Zhenan Fan · 2025 · arXiv 2512.16822

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

cs.DC · 2026-05-31 · unverdicted · novelty 7.0

On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.

HYPIC: Accelerating Hybrid-Attention LLM Serving with Position-Independent Caching

cs.DC · 2026-07-01 · unverdicted · novelty 6.0

Hypic enables position-independent KV caching for hybrid-attention models via segment-cumulative operators and boundary seam recomputation, delivering 2.45x average TTFT reduction and up to 2.0x throughput gain.

MiniPIC: Flexible Position-Independent Caching in <100LOC

cs.LG · 2026-06-11 · unverdicted · novelty 5.0

MiniPIC enables multiple position-independent caching methods inside vLLM via unrotated KV storage, per-request RoPE application, and three primitives, delivering 49% prefill throughput gains and up to 100x lower cached-span TTFT on 2WikiMultihopQA.

Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving

cs.DC · 2026-05-07 · unverdicted · novelty 5.0

Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics cs.DC · 2026-05-31 · unverdicted · none · ref 30
On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.
HYPIC: Accelerating Hybrid-Attention LLM Serving with Position-Independent Caching cs.DC · 2026-07-01 · unverdicted · none · ref 44
Hypic enables position-independent KV caching for hybrid-attention models via segment-cumulative operators and boundary seam recomputation, delivering 2.45x average TTFT reduction and up to 2.0x throughput gain.
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving cs.DC · 2026-05-07 · unverdicted · none · ref 29
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.

MEPIC: Memory efficient position independent caching for LLM serving

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer