MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.
Cachegen: Kv cache compression and streaming for fast large lan- guage model serving
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
citing papers explorer
-
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.
- Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live