org/abs/2501.08219

Paul Joe Maliakel, Shashikant Ilager, Ivona Brandic · 2025 · arXiv 2501.08219

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

Power capping is illusory in LLM decode as memory-bound operation leaves power headroom untouched on 700 W GPUs, while SM clock locking saves up to 32% energy and three DVFS classes appear across attention types.

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

cs.DC · 2026-03-12 · unverdicted · novelty 7.0

This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.

Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters

cs.LG · 2026-02-06 · unverdicted · novelty 7.0

Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.

EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

cs.NI · 2026-04-30 · unverdicted · novelty 6.0

Switchless topologies such as 3D full-mesh are 20.6-56.2% more cost-effective than scale-up networks for MoE LLM serving, with current link bandwidths over-provisioned by up to 27%.

KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

cs.DC · 2026-04-17 · unverdicted · novelty 6.0

KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

cs.DC · 2026-04-10 · unverdicted · novelty 6.0

Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.

SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

cs.AI · 2026-02-05 · unverdicted · novelty 6.0

SweetSpot is an analytical model from Transformer computational and memory complexity that identifies energy minima at short-to-moderate inputs and medium outputs, achieving 1.79% MAPE on H100 GPU measurements across multiple LLMs.

AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval

cs.IR · 2026-03-17 · unverdicted · novelty 3.0

AgriIR is a configurable RAG framework using modular stages and 1B-parameter models to deliver grounded, citable answers for Indian agricultural information access.

citing papers explorer

Showing 9 of 9 citing papers.

The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures cs.DC · 2026-05-12 · unverdicted · none · ref 11
Power capping is illusory in LLM decode as memory-bound operation leaves power headroom untouched on 700 W GPUs, while SM clock locking saves up to 32% energy and three DVFS classes appear across attention types.
Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows cs.DC · 2026-03-12 · unverdicted · none · ref 45
This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.
Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters cs.LG · 2026-02-06 · unverdicted · none · ref 46
Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving cs.CV · 2026-05-11 · unverdicted · none · ref 5 · 2 links
EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.
Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving cs.NI · 2026-04-30 · unverdicted · none · ref 40
Switchless topologies such as 3D full-mesh are 20.6-56.2% more cost-effective than scale-up networks for MoE LLM serving, with current link bandwidths over-provisioned by up to 27%.
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving cs.DC · 2026-04-17 · unverdicted · none · ref 42
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures cs.DC · 2026-04-10 · unverdicted · none · ref 42
Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.
SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference cs.AI · 2026-02-05 · unverdicted · none · ref 13
SweetSpot is an analytical model from Transformer computational and memory complexity that identifies energy minima at short-to-moderate inputs and medium outputs, achieving 1.79% MAPE on H100 GPU measurements across multiple LLMs.
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval cs.IR · 2026-03-17 · unverdicted · none · ref 27
AgriIR is a configurable RAG framework using modular stages and 1B-parameter models to deliver grounded, citable answers for Indian agricultural information access.

org/abs/2501.08219

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer