DualScale reduces energy by up to 39% in prefill and 48% in decode for disaggregated LLM serving while meeting TTFT and TPOT SLOs on a 16x H100 cluster.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS
DualScale reduces energy by up to 39% in prefill and 48% in decode for disaggregated LLM serving while meeting TTFT and TPOT SLOs on a 16x H100 cluster.