The ML. ENERGY benchmark: Toward auto- mated inference energy measurement and optimization

· 2025 · arXiv 2505.06371

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure

cs.DC · 2026-05-07 · unverdicted · novelty 7.0

CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.

KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

cs.DC · 2026-04-17 · unverdicted · novelty 6.0

KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

Benchmarking Compound AI Applications for Hardware-Software Co-Design

cs.DC · 2026-03-04 · unverdicted · novelty 6.0

Introduces a benchmarking suite for compound AI applications to support cross-stack performance, cost, and resource analysis for hardware-software co-design.

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

cs.NI · 2026-05-11 · unverdicted · novelty 5.0

GELATO combines drift-plus-penalty Lyapunov control with generative entropy early exiting to adaptively offload tokens in device-edge speculative decoding, delivering higher throughput and lower energy use than prior distributed SD systems while preserving output quality.

Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers

eess.SY · 2026-04-12 · unverdicted · novelty 5.0

Workload composition in AI data centers decouples aggregate power variability from short-horizon ramping through asymmetric queueing where batch jobs fill inference-induced idle capacity.

Transparent Screening for LLM Inference and Training Impacts

cs.LG · 2026-03-23 · unverdicted · novelty 5.0

The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.

From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint

cs.CY · 2026-05-06 · unverdicted · novelty 4.0

A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.

Grid Integration of AI Data Centers: A Critical Review of Energy Storage Solutions

eess.SY · 2026-02-28 · unverdicted · novelty 3.0

A hierarchical review of energy storage technologies for smoothing the sub-second variable loads of AI data centers on the utility grid.

citing papers explorer

Showing 9 of 9 citing papers.

CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure cs.DC · 2026-05-07 · unverdicted · none · ref 15
CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving cs.DC · 2026-04-17 · unverdicted · none · ref 11
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
The Energy Cost of Execution-Idle in GPU Clusters cs.DC · 2026-04-06 · unverdicted · none · ref 3
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
Benchmarking Compound AI Applications for Hardware-Software Co-Design cs.DC · 2026-03-04 · unverdicted · none · ref 6
Introduces a benchmarking suite for compound AI applications to support cross-stack performance, cost, and resource analysis for hardware-software co-design.
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference cs.NI · 2026-05-11 · unverdicted · none · ref 16
GELATO combines drift-plus-penalty Lyapunov control with generative entropy early exiting to adaptively offload tokens in device-edge speculative decoding, delivering higher throughput and lower energy use than prior distributed SD systems while preserving output quality.
Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers eess.SY · 2026-04-12 · unverdicted · none · ref 20
Workload composition in AI data centers decouples aggregate power variability from short-horizon ramping through asymmetric queueing where batch jobs fill inference-induced idle capacity.
Transparent Screening for LLM Inference and Training Impacts cs.LG · 2026-03-23 · unverdicted · none · ref 3
The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint cs.CY · 2026-05-06 · unverdicted · none · ref 17
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
Grid Integration of AI Data Centers: A Critical Review of Energy Storage Solutions eess.SY · 2026-02-28 · unverdicted · none · ref 125
A hierarchical review of energy storage technologies for smoothing the sub-second variable loads of AI data centers on the utility grid.

The ML. ENERGY benchmark: Toward auto- mated inference energy measurement and optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer