TurboServe introduces the first serving system for streaming video generation workloads, using migration-aware placement and load-driven autoscaling to cut worst-case latency by 37.5% and GPU cost by 37.2%.
Tridentserve: A stage-level serving system for diffusion pipelines
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
GENSERVE improves SLO attainment by up to 44% for co-serving heterogeneous T2I and T2V diffusion workloads via step-level preemption, elastic parallelism, and joint scheduling.
LegoDiffusion decomposes diffusion workflows into micro-served model nodes to achieve up to 3x higher throughput and 8x better burst tolerance than monolithic serving systems.
citing papers explorer
-
GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads
GENSERVE improves SLO attainment by up to 44% for co-serving heterogeneous T2I and T2V diffusion workloads via step-level preemption, elastic parallelism, and joint scheduling.