Fu, Zhiqiang Xie, Beidi Chen, Clark W

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

PALS adds dynamic GPU power capping to LLM serving frameworks like vLLM, jointly tuning it with batch size via offline models and feedback control to improve energy efficiency up to 26.3% and cut QoS violations 4-7x on dense and MoE models.

citing papers explorer

Showing 1 of 1 citing paper.

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models cs.AI · 2026-05-20 · unverdicted · none · ref 36
PALS adds dynamic GPU power capping to LLM serving frameworks like vLLM, jointly tuning it with batch size via offline models and feedback control to improve energy efficiency up to 26.3% and cut QoS violations 4-7x on dense and MoE models.

Fu, Zhiqiang Xie, Beidi Chen, Clark W

fields

years

verdicts

representative citing papers

citing papers explorer