Dist- serve: Disaggregating prefill and decoding for goodput- optimized large language model serving

Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing

cs.DC · 2025-12-22 · conditional · novelty 6.0

CascadeInfer partitions LLM instances into length-specialized groups, uses dynamic programming for stage partitioning, and applies runtime refinement plus decentralized load balancing to cut latency and raise throughput.

citing papers explorer

Showing 1 of 1 citing paper.

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing cs.DC · 2025-12-22 · conditional · none · ref 37
CascadeInfer partitions LLM instances into length-specialized groups, uses dynamic programming for stage partitioning, and applies runtime refinement plus decentralized load balancing to cut latency and raise throughput.

Dist- serve: Disaggregating prefill and decoding for goodput- optimized large language model serving

fields

years

verdicts

representative citing papers

citing papers explorer