Available: https://arxiv.org/pdf/2507.11417

· 2025 · arXiv 2507.11417

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

LLMSpace is the first framework to jointly model operational and embodied carbon for LLM inference on LEO satellites, incorporating radiation-hardened hardware, peripheral systems, and workload patterns such as prefill-decode behavior.

Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation

cs.SE · 2026-04-03 · unverdicted · novelty 7.0

Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.

WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

cs.DC · 2026-07-02 · unverdicted · novelty 6.0

WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.

EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.

Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale

cs.DC · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.

KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

cs.DC · 2026-04-17 · unverdicted · novelty 6.0

KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.

LLM Harms: A Taxonomy and Discussion

cs.CY · 2025-12-05

citing papers explorer

Showing 7 of 7 citing papers.

LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites cs.LG · 2026-05-07 · unverdicted · none · ref 46 · 2 links
LLMSpace is the first framework to jointly model operational and embodied carbon for LLM inference on LEO satellites, incorporating radiation-hardened hardware, peripheral systems, and workload patterns such as prefill-decode behavior.
Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation cs.SE · 2026-04-03 · unverdicted · none · ref 28
Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs cs.DC · 2026-07-02 · unverdicted · none · ref 18
WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving cs.CV · 2026-05-11 · unverdicted · none · ref 6 · 2 links
EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale cs.DC · 2026-05-07 · unverdicted · none · ref 27 · 2 links
BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving cs.DC · 2026-04-17 · unverdicted · none · ref 47
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
LLM Harms: A Taxonomy and Discussion cs.CY · 2025-12-05 · unreviewed · ref 58

Available: https://arxiv.org/pdf/2507.11417

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer