CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.
The ML. ENERGY benchmark: Toward auto- mated inference energy measurement and optimization
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9verdicts
UNVERDICTED 9roles
background 3representative citing papers
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
Introduces a benchmarking suite for compound AI applications to support cross-stack performance, cost, and resource analysis for hardware-software co-design.
GELATO combines drift-plus-penalty Lyapunov control with generative entropy early exiting to adaptively offload tokens in device-edge speculative decoding, delivering higher throughput and lower energy use than prior distributed SD systems while preserving output quality.
Workload composition in AI data centers decouples aggregate power variability from short-horizon ramping through asymmetric queueing where batch jobs fill inference-induced idle capacity.
The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
A hierarchical review of energy storage technologies for smoothing the sub-second variable loads of AI data centers on the utility grid.
citing papers explorer
-
CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure
CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.
-
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
-
The Energy Cost of Execution-Idle in GPU Clusters
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
-
Benchmarking Compound AI Applications for Hardware-Software Co-Design
Introduces a benchmarking suite for compound AI applications to support cross-stack performance, cost, and resource analysis for hardware-software co-design.
-
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
GELATO combines drift-plus-penalty Lyapunov control with generative entropy early exiting to adaptively offload tokens in device-edge speculative decoding, delivering higher throughput and lower energy use than prior distributed SD systems while preserving output quality.
-
Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers
Workload composition in AI data centers decouples aggregate power variability from short-horizon ramping through asymmetric queueing where batch jobs fill inference-induced idle capacity.
-
Transparent Screening for LLM Inference and Training Impacts
The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.
-
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
-
Grid Integration of AI Data Centers: A Critical Review of Energy Storage Solutions
A hierarchical review of energy storage technologies for smoothing the sub-second variable loads of AI data centers on the utility grid.