Energy-Aware Routing to Large Reasoning Models

arxiv: 2601.00823 · v2 · submitted 2025-12-23 · 💻 cs.AI · cs.IT· cs.SY· eess.SY· math.IT

Energy-Aware Routing to Large Reasoning Models

Austin R. Ellis-Mohr , Max Hartman , Lav R. Varshney This is my paper

Pith reviewed 2026-05-16 20:30 UTC · model grok-4.3

classification 💻 cs.AI cs.ITcs.SYeess.SYmath.IT

keywords energy-aware routinglarge reasoning modelscritical regimevariance-aware dispatchscaling lawsinference energyvolatility-limited performancemodel routing

0 comments p. Extension

The pith

In energy-aware routing for large reasoning models, the critical regime leaves performance limited by energy-use volatility rather than average supply.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that systems dispatching tasks to large reasoning models with different energy costs perform best when tuned to a critical regime balancing baseline and auxiliary energy supplies. In this regime, neither type of energy is wasted on average, but performance is instead constrained by stochastic fluctuations in how much energy each model consumes during reasoning. A second-order analysis shows that absorbing this variability across time, models, and choices becomes the key to efficiency, leading to variance-aware routing policies. These policies are characterized using scaling laws that relate performance to training and inference compute. This matters for building sustainable AI systems, as it offers a theoretical basis for reducing energy consumption in model selection and dispatch without relying on heuristics.

Core claim

In the critical regime, the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted, performance of LRM dispatch systems remains volatility-limited. Performance is governed by how variability is absorbed across time, models, and execution choices. This highlights variance-aware routing and dispatch as a principled design axis. Routing behavior is characterized when dispatch policies are based on training-compute and inference-compute scaling laws for LRMs.

What carries the argument

The critical regime, the unique balance point between mean energy provisioning and stochastic fluctuations in inference energy costs of large reasoning models.

If this is right

Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste.
Reducing supply induces persistent reliance on auxiliary energy.
Second-order characterization provides insights into variability absorption that first-order mean analysis misses.
Variance-aware routing policies can be derived from training-compute and inference-compute scaling laws.
Performance remains limited by volatility even at the optimal energy balance point.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same variability absorption principle could apply to routing in other heterogeneous compute environments with stochastic costs.
Real-world tests measuring actual energy traces in multi-model deployments would confirm whether the critical regime behaves as predicted.
Dynamic systems might adjust the critical point in real time based on observed fluctuations to maintain efficiency.
Scaling laws could enable direct computation of optimal routing fractions without extensive simulation.

Load-bearing premise

There exists a unique critical regime at which neither auxiliary energy nor baseline energy is systematically wasted, such that performance is governed by variability absorption.

What would settle it

An experiment showing that adjusting energy supply around the predicted critical point does not produce the expected transition from auxiliary reliance to baseline waste, or that observed performance deviates from volatility-limited predictions despite matching scaling laws.

Figures

Figures reproduced from arXiv: 2601.00823 by Austin R. Ellis-Mohr, Lav R. Varshney, Max Hartman.

**Figure 1.** Figure 1: System diagram time τ (x) ≥ 0, inducing a service interval [s(x), s(x) + τ (x)). Define the set of tasks in service at time t as S(t) := {x : t0(x) ≤ t, s(x) ≤ t < s(x) + τ (x)}. The aggregate energy-consumption rate is then Ct := X x∈St ei(x) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Deviation of the normalized expected reserve from drift [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Energy consumption and latency per task to generate a response [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Expected auxiliary energy consumption E[DT ] versus time horizon T under varying prediction errors E for the myopic policy. The zero error policy (E = 0) exhibits square-root scaling throughout (dashed fit), while nonzero error policies transition from fluctuationdominated (square-root scaling, dashed) to drift-dominated (linear scaling, dotted) regimes at the vertical markers. Shaded regions indicate sta… view at source ↗

read the original abstract

Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of systems that dispatch tasks to different individual LRMs depend on the balance between mean energy provisioning and stochastic fluctuations. The critical regime is the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted. Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste, while reducing supply induces persistent reliance on auxiliary energy. Yet in this regime, performance remains volatility-limited and so a second-order characterization provides further insights that we develop. Here, performance is governed by how variability is absorbed across time, models, and execution choices. This perspective highlights variance-aware routing and dispatch as a principled design axis, and provides a theoretical basis for developing energy-aware model routing policies. Routing behavior is characterized when dispatch policies are based on training-compute and inference-compute scaling laws for LRMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a critical-regime view of energy routing for LRMs but stays descriptive and does not derive the claimed policies or prove regime uniqueness.

read the letter

The main point is that this paper frames energy-aware dispatch for large reasoning models around a single critical regime where performance becomes volatility-limited, then suggests variance-aware routing based on training and inference scaling laws. It correctly flags that inference energy fluctuates with model choice and reasoning depth, so mean provisioning alone will not be enough at scale. That observation is practical and worth stating. The framing of a balance point that avoids both baseline waste and auxiliary overuse is a clean way to organize the problem. Beyond that, the work mostly restates established scaling-law ideas in a new setting without adding a closed-form policy, simulation results, or a proof that the regime is unique. The abstract and stress-test note both indicate the paper supplies no explicit energy-balance equation or derivation showing how second-order variability absorption produces concrete routing rules. If the full text follows the same pattern, the central claim reduces to an assumption rather than a derived result. This is a minor-to-moderate gap for an ideas paper but becomes load-bearing if the authors want to claim actionable design guidance. The work is aimed at researchers who already work on efficient inference systems and scaling laws; they might borrow the regime language for discussion but would still need to supply the missing math themselves. It is coherent enough on its own terms to warrant referee time, though any review should focus on whether the authors can add derivations or small-scale experiments to make the variance-aware claim falsifiable.

Referee Report

3 major / 1 minor

Summary. The paper claims that energy-aware dispatching to large reasoning models (LRMs) is governed by a unique critical regime of energy provisioning in which mean supply exactly balances expected stochastic demand without systematic waste of baseline or auxiliary energy. In this regime performance is volatility-limited, so that a second-order characterization of variability absorption across time, models, and execution choices yields variance-aware routing policies; these policies are characterized via training-compute and inference-compute scaling laws for LRMs.

Significance. If the critical regime can be rigorously defined and the scaling-law mapping to concrete dispatch rules derived and validated, the work could supply a principled theoretical axis for sustainable LRM deployment that moves beyond mean-energy optimization. The emphasis on variance absorption as a design lever is potentially novel, but the absence of any supporting equations, fixed-point analysis, or empirical results in the manuscript renders the significance speculative at present.

major comments (3)

[Abstract] Abstract: the uniqueness of the critical regime and the claim that 'neither auxiliary energy nor baseline energy is systematically wasted' are asserted without an explicit energy-balance equation, fixed-point derivation, or proof that the operating point is unique and independent of fitted scaling-law parameters.
[Abstract] Abstract: the mapping from second-order variability absorption to actionable variance-aware routing policies is described at a high level but never derived; no explicit policy rule, objective function, or translation from training/inference scaling laws to dispatch decisions is supplied.
[Abstract] Abstract: the central assertion that 'performance remains volatility-limited' in the critical regime is unsupported by any data, error analysis, or even a toy model; the text supplies no quantitative evidence that variability absorption governs performance once the mean-balance condition holds.

minor comments (1)

[Abstract] Abstract: the sentence 'we develop' implies further technical development later in the manuscript, yet the provided text remains entirely descriptive and contains no equations, algorithms, or experimental sections to fulfill that promise.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive report and the clear identification of gaps in the current manuscript. We address each major comment below and will incorporate the requested formalizations, derivations, and supporting analysis in a revised version.

read point-by-point responses

Referee: [Abstract] Abstract: the uniqueness of the critical regime and the claim that 'neither auxiliary energy nor baseline energy is systematically wasted' are asserted without an explicit energy-balance equation, fixed-point derivation, or proof that the operating point is unique and independent of fitted scaling-law parameters.

Authors: We agree the abstract states the critical regime at a high level. The revised manuscript will add an explicit energy-balance equation defining the regime as the fixed point where mean supply equals expected stochastic demand, together with a short derivation establishing uniqueness under the scaling-law assumptions and independence from specific parameter values. revision: yes
Referee: [Abstract] Abstract: the mapping from second-order variability absorption to actionable variance-aware routing policies is described at a high level but never derived; no explicit policy rule, objective function, or translation from training/inference scaling laws to dispatch decisions is supplied.

Authors: The current text outlines the perspective without the explicit mapping. We will insert a derivation section that translates the second-order variability absorption into a concrete objective function for variance-aware routing and shows how the training-compute and inference-compute scaling laws determine the dispatch thresholds. revision: yes
Referee: [Abstract] Abstract: the central assertion that 'performance remains volatility-limited' in the critical regime is unsupported by any data, error analysis, or even a toy model; the text supplies no quantitative evidence that variability absorption governs performance once the mean-balance condition holds.

Authors: We acknowledge the absence of quantitative support. The revision will include a minimal toy model and accompanying simulation that isolates the critical regime and demonstrates that performance metrics become governed by variability absorption once mean balance is achieved. revision: yes

Circularity Check

2 steps flagged

Critical regime defined by energy-balance condition with asserted uniqueness; scaling-law routing characterization reduces to fitted inputs

specific steps

self definitional [Abstract]
"The critical regime is the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted. Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste, while reducing supply induces persistent reliance on auxiliary energy."

The regime is introduced by definition as the point of exact balance with no systematic waste; uniqueness is asserted as part of that definition rather than derived from an energy-balance equation or fixed-point argument.
fitted input called prediction [Abstract]
"Routing behavior is characterized when dispatch policies are based on training-compute and inference-compute scaling laws for LRMs."

Scaling laws are obtained by fitting parameters to data; stating that routing behavior is characterized by basing dispatch policies on those laws makes the characterization a direct re-expression of the fitted quantities.

full rationale

The paper's central chain defines the critical regime exactly as the operating point with no systematic waste of baseline or auxiliary energy and asserts uniqueness without an explicit balance equation or fixed-point proof. It then states that routing policies are characterized directly from training- and inference-compute scaling laws. Because scaling-law parameters are obtained by fitting and the regime is introduced definitionally, both the uniqueness claim and the dispatch-policy characterization reduce to the paper's own inputs by construction. The second-order volatility analysis may contain independent content, but the load-bearing steps do not.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; full text would be required to audit any scaling-law constants, regime-balance definitions, or variability-absorption assumptions.

pith-pipeline@v0.9.0 · 5493 in / 1067 out tokens · 32306 ms · 2026-05-16T20:30:15.390185+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 3 internal anchors

[1]

Multi-step reasoning with large language models, a survey,

A. Plaat, A. Wong, S. Verberne, J. Broekens, N. Van Stein, and T. B ¨ack, “Multi-step reasoning with large language models, a survey,”ACM Computing Surveys, vol. 58, no. 6, pp. 1–35, 2025

work page 2025
[2]

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search,

Austin R. Ellis-Mohr, Anuj K. Nayak, and Lav R. Varshney, “A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search,”Philosophical Transactions of the Royal Society A, 2026, to appear

work page 2026
[3]

Redesigning data centers for renewable energy,

A. Agarwal, J. Sun, S. Noghabi, S. Iyengar, A. Badam, R. Chan- dra, S. Seshan, and S. Kalyanaraman, “Redesigning data centers for renewable energy,” inProceedings of the 20th ACM Workshop on Hot Topics in Networks (HotNets ’21), 2021, pp. 45–52

work page 2021
[4]

Smart operation of smart grid: Risk-limiting dispatch,

P. P. Varaiya, F. F. Wu, and J. W. Bialek, “Smart operation of smart grid: Risk-limiting dispatch,”Proceedings of the IEEE, vol. 99, no. 1, pp. 40–57, 2011

work page 2011
[5]

Risk-limited dispatch of knowledge work,

S. Agarwal, Y .-M. Chee, J. Lee, R. R. Sindhgatta, and L. R. Varshney, “Risk-limited dispatch of knowledge work,” Oct. 2014, US Patent App. 13/870,422

work page 2014
[6]

Large language model routing with benchmark datasets,

T. Shnitzer, A. Ou, M. Silva, K. Soule, Y . Sun, J. Solomon, N. Thompson, and M. Yurochkin, “Large language model routing with benchmark datasets,” inProceedings of the Conference on Language Modeling (COLM 2024), 2024

work page 2024
[7]

Estimating the carbon footprint of BLOOM, a 176B parameter language model,

A. S. Luccioni, S. Viguier, and A.-L. Ligozat, “Estimating the carbon footprint of BLOOM, a 176B parameter language model,” Journal of Machine Learning Research, vol. 24, no. 253, pp. 1– 15, 2023

work page 2023
[8]

How hungry is

N. Jegham, M. Abdelatti, C. Y . Koh, L. Elmoubarki, and A. Hen- dawi, “How hungry is AI? benchmarking energy, water, and carbon footprint of LLM inference,” arXiv:2505.09598 [cs.CY], 2025

work page arXiv 2025
[9]

arXiv preprint arXiv:2403.08151 URL:https://arxiv.org/abs/2403.08151

C. E. Tripp, J. Perr-Sauer, J. Gafur, A. Nag, A. Purkayastha, S. Zisman, and E. A. Bensen, “Measuring the energy consump- tion and efficiency of deep neural networks: An empirical anal- ysis and design recommendations,” arXiv 2403.08151 [cs.LG], 2024

work page arXiv 2024
[10]

The price of prompting: Profiling energy use in large language models inference,

E. J. Husom, A. Goknil, L. K. Shar, and S. Sen, “The price of prompting: Profiling energy use in large language models inference,” arXiv:2407.16893 [cs.CY], 2024

work page arXiv 2024
[11]

An information theory of compute-optimal size scaling, emergence, and plateaus in language models,

A. K. Nayak and L. R. Varshney, “An information theory of compute-optimal size scaling, emergence, and plateaus in language models,”IEEE Journal of Selected Topics in Signal Processing, 2026, to appear

work page 2026
[12]

Clover: Toward sustainable AI with carbon-aware machine learning inference service,

B. Li, S. Samsi, V . Gadepally, and D. Tiwari, “Clover: Toward sustainable AI with carbon-aware machine learning inference service,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Anal- ysis, 2023, pp. 1–15

work page 2023
[13]

EcoServe: Designing carbon-aware AI inference systems,

Y . Li, Z. Hu, E. Choukse, R. Fonseca, G. E. Suh, and U. Gupta, “EcoServe: Designing carbon-aware AI inference systems,” arXiv 2502.05043 [cs.DC], 2025

work page arXiv 2025
[14]

FrugalGPT: How to use large language models while reducing cost and improving per- formance,

L. Chen, M. Zaharia, and J. Zou, “FrugalGPT: How to use large language models while reducing cost and improving per- formance,”Transactions on Machine Learning Research, 2024

work page 2024
[15]

A simple and effective pruning approach for large language models,

M. Sun, Z. Liu, A. Bair, and J. Z. Kolter, “A simple and effective pruning approach for large language models,” inIn Proceedings of the Twelfth International Conference on Learning Representations, 2024. 9

work page 2024
[16]

Compact language models via pruning and knowledge distillation,

S. Muralidharan, S. T. Sreenivas, R. B. Joshi, M. Cho- chowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, and P. Molchanov, “Compact language models via pruning and knowledge distillation,” inAdvances in Neural Information Pro- cessing Systems, 2024, vol. 37, pp. 41 076–41 102

work page 2024
[17]

arXiv preprint arXiv:2305.02301 , year=

C.-Y . Hsieh, C.-L. Li, C.-K. Yeh, H. Nakhost, Y . Fujii, A. Ratner, R. Krishna, C.-Y . Lee, and T. Pfister, “Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes,” arXiv:2305.02301, 2023

work page arXiv 2023
[18]

Optimal packet scheduling in an en- ergy harvesting communication system,

J. Yang and S. Ulukus, “Optimal packet scheduling in an en- ergy harvesting communication system,”IEEE Transactions on Communications, vol. 60, no. 1, pp. 220–230, Jan. 2012

work page 2012
[19]

Energy harvesting wireless communications: A review of recent advances,

S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover, and K. Huang, “Energy harvesting wireless communications: A review of recent advances,”IEEE Journal on Selected Areas in Communications, vol. 33, no. 3, pp. 360–381, Mar. 2015

work page 2015
[20]

Low-latency communications over zero-battery energy harvesting channels,

E. M. A. Yener, “Low-latency communications over zero-battery energy harvesting channels,” inProceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), Dec. 2015

work page 2015
[21]

Explaining neural scaling laws,

Y . Bahri, E. Dyer, J. Kaplan, J. Lee, and U. Sharma, “Explaining neural scaling laws,”Proceedings of the National Academy of Sciences, vol. 121, no. 27, p. e2311878121, Jun. 2024

work page 2024
[22]

Foundation Models for Discovery and Exploration in Chemical Space

A. Wadell, A. Bhutani, V . Azumah, A. R. Ellis-Mohr, C. Kelly, H. Zhao, A. K. Nayak, K. Hegazy, A. Brace, H. Lin, M. Emani, K. G. V . Vishwanath, M. Alkan, T. Gibbs, J. Wells, L. R. Varsh- ney, B. Ramsundar, K. Duraisamy, A. Ramanathan, M. Mahoney, and V . Viswanathan, “Foundation models for discovery and exploration in chemical space,”arXiv preprint arXi...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

M. D. Donsker,An Invariance Principle for Certain Probability Limit Theorems, ser. Memoirs of the American Mathematical Society. American Mathematical Society, 1951, no. 6

work page 1951
[24]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv:2001.08361 [cs.LG], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[25]

Training Compute-Optimal Large Language Models

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driess- che, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, and L. Sifre, “Training compute-optimal large language models,” arXiv:2203.15556 [cs.CL], 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Stochastic routing and scheduling policies for energy harvesting communication networks,

M. Calvo-Fullana, C. Ant ´on-Haro, J. Matamoros, and A. Ribeiro, “Stochastic routing and scheduling policies for energy harvesting communication networks,”IEEE Transactions on Signal Process- ing, vol. 66, no. 13, pp. 3363–3376, Jul. 2018

work page 2018
[27]

Stability properties of con- strained queueing systems and scheduling policies for maximum throughput in multihop radio networks,

L. Tassiulas and A. Ephremides, “Stability properties of con- strained queueing systems and scheduling policies for maximum throughput in multihop radio networks,” inProceedings of the 29th IEEE Conference on Decision and Control, 1990, pp. 2130– 2132

work page 1990
[28]

Carbon- and Precedence-Aware Scheduling for Data Processing Clusters,

A. Lechowicz, R. Shenoy, N. Bashir, M. Hajiesmaili, A. Wier- man, and C. Delimitrou, “Carbon- and precedence-aware scheduling for data processing clusters,” arXiv:2502.09717 [cs.DC], 2025

work page arXiv 2025
[29]

A. N. Borodin and P. Salminen,Handbook of Brownian Motion - Facts and Formulae. Birkh ¨auser, 2002. APPENDIXA SUPPORTINGRESULTS ANDPROOFS Proof of Theorem 1:Define the cumulative injected energy up to timetby St := t−1X s=0 Gs, t= 0,1, . . . , T, with the conventionS 0 = 0. Summing the controlled recursion and comparing to the uncontrolled one yields the ...

work page 2002

[1] [1]

Multi-step reasoning with large language models, a survey,

A. Plaat, A. Wong, S. Verberne, J. Broekens, N. Van Stein, and T. B ¨ack, “Multi-step reasoning with large language models, a survey,”ACM Computing Surveys, vol. 58, no. 6, pp. 1–35, 2025

work page 2025

[2] [2]

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search,

Austin R. Ellis-Mohr, Anuj K. Nayak, and Lav R. Varshney, “A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search,”Philosophical Transactions of the Royal Society A, 2026, to appear

work page 2026

[3] [3]

Redesigning data centers for renewable energy,

A. Agarwal, J. Sun, S. Noghabi, S. Iyengar, A. Badam, R. Chan- dra, S. Seshan, and S. Kalyanaraman, “Redesigning data centers for renewable energy,” inProceedings of the 20th ACM Workshop on Hot Topics in Networks (HotNets ’21), 2021, pp. 45–52

work page 2021

[4] [4]

Smart operation of smart grid: Risk-limiting dispatch,

P. P. Varaiya, F. F. Wu, and J. W. Bialek, “Smart operation of smart grid: Risk-limiting dispatch,”Proceedings of the IEEE, vol. 99, no. 1, pp. 40–57, 2011

work page 2011

[5] [5]

Risk-limited dispatch of knowledge work,

S. Agarwal, Y .-M. Chee, J. Lee, R. R. Sindhgatta, and L. R. Varshney, “Risk-limited dispatch of knowledge work,” Oct. 2014, US Patent App. 13/870,422

work page 2014

[6] [6]

Large language model routing with benchmark datasets,

T. Shnitzer, A. Ou, M. Silva, K. Soule, Y . Sun, J. Solomon, N. Thompson, and M. Yurochkin, “Large language model routing with benchmark datasets,” inProceedings of the Conference on Language Modeling (COLM 2024), 2024

work page 2024

[7] [7]

Estimating the carbon footprint of BLOOM, a 176B parameter language model,

A. S. Luccioni, S. Viguier, and A.-L. Ligozat, “Estimating the carbon footprint of BLOOM, a 176B parameter language model,” Journal of Machine Learning Research, vol. 24, no. 253, pp. 1– 15, 2023

work page 2023

[8] [8]

How hungry is

N. Jegham, M. Abdelatti, C. Y . Koh, L. Elmoubarki, and A. Hen- dawi, “How hungry is AI? benchmarking energy, water, and carbon footprint of LLM inference,” arXiv:2505.09598 [cs.CY], 2025

work page arXiv 2025

[9] [9]

arXiv preprint arXiv:2403.08151 URL:https://arxiv.org/abs/2403.08151

C. E. Tripp, J. Perr-Sauer, J. Gafur, A. Nag, A. Purkayastha, S. Zisman, and E. A. Bensen, “Measuring the energy consump- tion and efficiency of deep neural networks: An empirical anal- ysis and design recommendations,” arXiv 2403.08151 [cs.LG], 2024

work page arXiv 2024

[10] [10]

The price of prompting: Profiling energy use in large language models inference,

E. J. Husom, A. Goknil, L. K. Shar, and S. Sen, “The price of prompting: Profiling energy use in large language models inference,” arXiv:2407.16893 [cs.CY], 2024

work page arXiv 2024

[11] [11]

An information theory of compute-optimal size scaling, emergence, and plateaus in language models,

A. K. Nayak and L. R. Varshney, “An information theory of compute-optimal size scaling, emergence, and plateaus in language models,”IEEE Journal of Selected Topics in Signal Processing, 2026, to appear

work page 2026

[12] [12]

Clover: Toward sustainable AI with carbon-aware machine learning inference service,

B. Li, S. Samsi, V . Gadepally, and D. Tiwari, “Clover: Toward sustainable AI with carbon-aware machine learning inference service,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Anal- ysis, 2023, pp. 1–15

work page 2023

[13] [13]

EcoServe: Designing carbon-aware AI inference systems,

Y . Li, Z. Hu, E. Choukse, R. Fonseca, G. E. Suh, and U. Gupta, “EcoServe: Designing carbon-aware AI inference systems,” arXiv 2502.05043 [cs.DC], 2025

work page arXiv 2025

[14] [14]

FrugalGPT: How to use large language models while reducing cost and improving per- formance,

L. Chen, M. Zaharia, and J. Zou, “FrugalGPT: How to use large language models while reducing cost and improving per- formance,”Transactions on Machine Learning Research, 2024

work page 2024

[15] [15]

A simple and effective pruning approach for large language models,

M. Sun, Z. Liu, A. Bair, and J. Z. Kolter, “A simple and effective pruning approach for large language models,” inIn Proceedings of the Twelfth International Conference on Learning Representations, 2024. 9

work page 2024

[16] [16]

Compact language models via pruning and knowledge distillation,

S. Muralidharan, S. T. Sreenivas, R. B. Joshi, M. Cho- chowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, and P. Molchanov, “Compact language models via pruning and knowledge distillation,” inAdvances in Neural Information Pro- cessing Systems, 2024, vol. 37, pp. 41 076–41 102

work page 2024

[17] [17]

arXiv preprint arXiv:2305.02301 , year=

C.-Y . Hsieh, C.-L. Li, C.-K. Yeh, H. Nakhost, Y . Fujii, A. Ratner, R. Krishna, C.-Y . Lee, and T. Pfister, “Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes,” arXiv:2305.02301, 2023

work page arXiv 2023

[18] [18]

Optimal packet scheduling in an en- ergy harvesting communication system,

J. Yang and S. Ulukus, “Optimal packet scheduling in an en- ergy harvesting communication system,”IEEE Transactions on Communications, vol. 60, no. 1, pp. 220–230, Jan. 2012

work page 2012

[19] [19]

Energy harvesting wireless communications: A review of recent advances,

S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover, and K. Huang, “Energy harvesting wireless communications: A review of recent advances,”IEEE Journal on Selected Areas in Communications, vol. 33, no. 3, pp. 360–381, Mar. 2015

work page 2015

[20] [20]

Low-latency communications over zero-battery energy harvesting channels,

E. M. A. Yener, “Low-latency communications over zero-battery energy harvesting channels,” inProceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), Dec. 2015

work page 2015

[21] [21]

Explaining neural scaling laws,

Y . Bahri, E. Dyer, J. Kaplan, J. Lee, and U. Sharma, “Explaining neural scaling laws,”Proceedings of the National Academy of Sciences, vol. 121, no. 27, p. e2311878121, Jun. 2024

work page 2024

[22] [22]

Foundation Models for Discovery and Exploration in Chemical Space

A. Wadell, A. Bhutani, V . Azumah, A. R. Ellis-Mohr, C. Kelly, H. Zhao, A. K. Nayak, K. Hegazy, A. Brace, H. Lin, M. Emani, K. G. V . Vishwanath, M. Alkan, T. Gibbs, J. Wells, L. R. Varsh- ney, B. Ramsundar, K. Duraisamy, A. Ramanathan, M. Mahoney, and V . Viswanathan, “Foundation models for discovery and exploration in chemical space,”arXiv preprint arXi...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

M. D. Donsker,An Invariance Principle for Certain Probability Limit Theorems, ser. Memoirs of the American Mathematical Society. American Mathematical Society, 1951, no. 6

work page 1951

[24] [24]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv:2001.08361 [cs.LG], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[25] [25]

Training Compute-Optimal Large Language Models

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driess- che, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, and L. Sifre, “Training compute-optimal large language models,” arXiv:2203.15556 [cs.CL], 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[26] [26]

Stochastic routing and scheduling policies for energy harvesting communication networks,

M. Calvo-Fullana, C. Ant ´on-Haro, J. Matamoros, and A. Ribeiro, “Stochastic routing and scheduling policies for energy harvesting communication networks,”IEEE Transactions on Signal Process- ing, vol. 66, no. 13, pp. 3363–3376, Jul. 2018

work page 2018

[27] [27]

Stability properties of con- strained queueing systems and scheduling policies for maximum throughput in multihop radio networks,

L. Tassiulas and A. Ephremides, “Stability properties of con- strained queueing systems and scheduling policies for maximum throughput in multihop radio networks,” inProceedings of the 29th IEEE Conference on Decision and Control, 1990, pp. 2130– 2132

work page 1990

[28] [28]

Carbon- and Precedence-Aware Scheduling for Data Processing Clusters,

A. Lechowicz, R. Shenoy, N. Bashir, M. Hajiesmaili, A. Wier- man, and C. Delimitrou, “Carbon- and precedence-aware scheduling for data processing clusters,” arXiv:2502.09717 [cs.DC], 2025

work page arXiv 2025

[29] [29]

A. N. Borodin and P. Salminen,Handbook of Brownian Motion - Facts and Formulae. Birkh ¨auser, 2002. APPENDIXA SUPPORTINGRESULTS ANDPROOFS Proof of Theorem 1:Define the cumulative injected energy up to timetby St := t−1X s=0 Gs, t= 0,1, . . . , T, with the conventionS 0 = 0. Summing the controlled recursion and comparing to the uncontrolled one yields the ...

work page 2002