Mooncake: A kvcache-centric disaggregated architecture for llm serving.ACM Transactions on Storage, 2024

Ruoyu Qin, Zheming Li, Weiran He, Jialei Cui, Heyi Tang, Feng Ren, Teng Ma, Shangming Cai, Yineng Zhang, Mingxing Zhang, et al · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

cs.LG · 2026-02-06 · conditional · novelty 7.0

Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

cs.DC · 2026-04-16 · unverdicted · novelty 6.0

PrfaaS enables practical cross-datacenter prefill-decode disaggregation for hybrid-attention models via selective offloading, bandwidth-aware scheduling, and cache-aware placement, yielding 54% higher throughput and 64% lower P90 TTFT than homogeneous baselines in a 1T-parameter case study.

citing papers explorer

Showing 2 of 2 citing papers.

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System cs.LG · 2026-02-06 · conditional · none · ref 23
Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.
Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter cs.DC · 2026-04-16 · unverdicted · none · ref 22
PrfaaS enables practical cross-datacenter prefill-decode disaggregation for hybrid-attention models via selective offloading, bandwidth-aware scheduling, and cache-aware placement, yielding 54% higher throughput and 64% lower P90 TTFT than homogeneous baselines in a 1T-parameter case study.

Mooncake: A kvcache-centric disaggregated architecture for llm serving.ACM Transactions on Storage, 2024

fields

years

verdicts

representative citing papers

citing papers explorer