pith. sign in

The Energy Cost of Execution-Idle in GPU Clusters

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.

fields

cs.DC 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Energy-Efficient Multimodal Inference Serving with Tri-serve

cs.DC · 2026-06-28 · unverdicted · novelty 5.0 · 2 refs

Tri-serve is a software DVFS controller that jointly mitigates inter-stage dependency stalls, arithmetic-intensity effects on frequency, and thermal throttling to deliver 22% better energy efficiency in multimodal inference serving with no latency or throughput loss.

citing papers explorer

Showing 1 of 1 citing paper.

  • Energy-Efficient Multimodal Inference Serving with Tri-serve cs.DC · 2026-06-28 · unverdicted · none · ref 21 · 2 links · internal anchor

    Tri-serve is a software DVFS controller that jointly mitigates inter-stage dependency stalls, arithmetic-intensity effects on frequency, and thermal throttling to deliver 22% better energy efficiency in multimodal inference serving with no latency or throughput loss.