{DistServe}: Disaggregating prefill and decoding for goodput-optimized large language model serving

Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang · 2024

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 2

representative citing papers

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

cs.DC · 2026-05-05 · unverdicted · novelty 7.0

Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.

Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs

cs.IR · 2025-04-22 · unverdicted · novelty 5.0

The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.

Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live

cs.OS · 2025-11-04

citing papers explorer

Showing 4 of 4 citing papers.

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs cs.DC · 2026-05-05 · unverdicted · none · ref 54
Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics cs.DC · 2026-04-08 · unverdicted · none · ref 6
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs cs.IR · 2025-04-22 · unverdicted · none · ref 115
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live cs.OS · 2025-11-04 · unreviewed · ref 84

{DistServe}: Disaggregating prefill and decoding for goodput-optimized large language model serving

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer