This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.
Offline energy-optimal llm serving: Workload-based energy models for llm inference on heterogeneous systems.ACM SIGENERGY Energy Informatics Review, 4(5):113–119
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows
This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.