Mooncake: A kvcache- centric disaggregated architecture for llm serving.ACM Transactions on Storage, 2024

Ruoyu Qin, Zheming Li, Weiran He, Jialei Cui, Heyi Tang, Feng Ren, Teng Ma, Shangming Cai, Yineng Zhang, Mingxing Zhang, et al · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

cs.DC · 2026-05-05 · unverdicted · novelty 7.0

Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving

cs.LG · 2026-05-03 · unverdicted · novelty 7.0 · 2 refs

MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.

citing papers explorer

Showing 2 of 2 citing papers.

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs cs.DC · 2026-05-05 · unverdicted · none · ref 38
Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving cs.LG · 2026-05-03 · unverdicted · none · ref 53 · 2 links
MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.

Mooncake: A kvcache- centric disaggregated architecture for llm serving.ACM Transactions on Storage, 2024

fields

years

verdicts

representative citing papers

citing papers explorer