On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.
MEPIC: Memory efficient position independent caching for LLM serving
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
MiniPIC enables multiple position-independent caching methods inside vLLM via unrotated KV storage, per-request RoPE application, and three primitives, delivering 49% prefill throughput gains and up to 100x lower cached-span TTFT on 2WikiMultihopQA.
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.
citing papers explorer
-
Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics
On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.
-
MiniPIC: Flexible Position-Independent Caching in <100LOC
MiniPIC enables multiple position-independent caching methods inside vLLM via unrotated KV storage, per-request RoPE application, and three primitives, delivering 49% prefill throughput gains and up to 100x lower cached-span TTFT on 2WikiMultihopQA.
-
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.