Tokenpowerbench: Benchmarking the power consumption of llm inference

· 2025 · arXiv 2512.03024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 3 method 1

citation-polarity summary

background 2 support 1 use method 1

representative citing papers

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

cs.DC · 2026-04-10 · unverdicted · novelty 6.0

Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

cs.CE · 2026-05-12 · unverdicted · novelty 5.0

LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

cs.LG · 2026-03-22 · unverdicted · novelty 5.0

The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

cs.SE · 2026-03-01 · unverdicted · novelty 3.0

A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.

citing papers explorer

Showing 5 of 5 citing papers.

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures cs.DC · 2026-04-10 · unverdicted · none · ref 19
Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.
The Energy Cost of Execution-Idle in GPU Clusters cs.DC · 2026-04-06 · unverdicted · none · ref 27
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production cs.CE · 2026-05-12 · unverdicted · none · ref 22
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project cs.LG · 2026-03-22 · unverdicted · none · ref 112
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review cs.SE · 2026-03-01 · unverdicted · none · ref 40
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.

Tokenpowerbench: Benchmarking the power consumption of llm inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer