LLMSpace is the first framework to jointly model operational and embodied carbon for LLM inference on LEO satellites, incorporating radiation-hardened hardware, peripheral systems, and workload patterns such as prefill-decode behavior.
Available: https://arxiv.org/pdf/2507.11417
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.
WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.
EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.
BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
citing papers explorer
-
LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites
LLMSpace is the first framework to jointly model operational and embodied carbon for LLM inference on LEO satellites, incorporating radiation-hardened hardware, peripheral systems, and workload patterns such as prefill-decode behavior.
-
Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation
Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.
-
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs
WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.
-
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
EnergyLens derives a twelve-parameter closed-form energy model via symbolic regression that achieves 88.2% top-1 configuration accuracy with 50 samples and extrapolates to unseen batch sizes and hardware.
-
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.
-
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
- LLM Harms: A Taxonomy and Discussion