pith. sign in

arxiv: 2605.23348 · v1 · pith:LVRXPRKFnew · submitted 2026-05-22 · 💻 cs.DC · cs.AI· cs.NI

XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms

Pith reviewed 2026-05-25 03:21 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.NI
keywords XWindAI inference routingrenewable energywind powerLLM servingcross-site distributiongreen computinglatency optimization
0
0 comments X

The pith

XWind reduces P99 latency for LLM inference by up to 52% at wind-powered renewable sites.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes deploying AI compute directly at wind farms to match growing power demand with local renewable supply. It introduces XWind, a router that dynamically distributes inference requests using only real-time signals like latency and queue depth to handle variable wind power. This approach is shown to maintain high utilization through site right-sizing and spatial complementarity of wind resources. A reader would care because it suggests a way to expand AI infrastructure without additional grid strain or long-distance transmission losses.

Core claim

By emulating three wind-powered sites on a 64-GPU A100 testbed with Azure traces, XWind achieves up to 52% lower P99 end-to-end latency than the strongest alternative router and up to 98% lower than power-capping or GPU idling baselines, with gains consistent across workloads and GPU generations.

What carries the argument

XWind, the lightweight reactive router that configures sites and distributes requests based solely on inference latency, KV-cache utilization, and queue depth.

If this is right

  • AI compute can expand at renewable sources while generating local demand.
  • Site-wise right-sizing with spatial complementarity maintains fleet utilization comparable to traditional setups.
  • 890+ GW of wind capacity is within 50 ms RTT of existing data centers.
  • The router works workload-agnostically without needing power predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model could extend to other variable renewables like solar if similar complementarity applies.
  • Deployments beyond 50 ms RTT might require additional network optimizations to preserve latency benefits.

Load-bearing premise

The testbed emulation of three sites with production traces accurately captures real-world wind variability, network latency, and right-sizing effects for the 50 ms RTT model.

What would settle it

Running the system on actual geographically distributed wind farms and measuring P99 latency under observed wind patterns would confirm or refute the reported reductions.

Figures

Figures reproduced from arXiv: 2605.23348 by Ajay Manchepalli, Anjaly Parayil, Atharva Deshmukh, Chaojie Zhang, Debopam Bhattacherjee, Liangcheng Yu, Mike Shepperd, Rohan Gandhi, Srinivasan Iyengar, Tella Rajashekhar Reddy.

Figure 1
Figure 1. Figure 1: AI Greenferencing with XWind. a win-for-all: (1) users gain access to sustainable AI services; (2) AI providers unlock additional compute capacity, user reach, and revenue; (3) wind farms can monetize output locally; and (4) power grids benefit from reduced load and a breather for expansion. Note that Greenferencing is designed not to obviate but to co-exist with traditional data centers as a complementary… view at source ↗
Figure 3
Figure 3. Figure 3: Azure coding trace: average prefill length varies significantly over a week. for wind power prediction with very high accuracy. The broad availability of such predictors 1 helps us treat wind power generation as an oracle (variable yet predictable with high accuracy) in Greenferencing systems design. Spatial complementarity smooths variability. Wind gen￾eration across geographically dispersed sites exhibit… view at source ↗
Figure 4
Figure 4. Figure 4: Availability vs. provisioning tradeoff for 3 Azure DCs at 20 ms RTT. site within 20ms fiber RTT of a Azure DC, we compute the 𝑥 𝑡ℎ percentile of its generation time series, cap output at that percentile, and sum capped power across all sites [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Profiling results of Llama 3.1 8B on A100 40GB for Azure conversation workload. (a) Power consumption vs Frequency (P99) (b) TBT vs Frequency (P99) (c) KV Usage vs Frequency (P99) [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Profiling results of Llama 3.1 8B on H100 80 GB for Azure conversation workload. across workloads at low frequencies but exhibits slight RPS￾dependence at mid-range frequencies like 810 MHz. To re￾main conservative, we build the frequency-to-peak-power lookup table from the peak-load envelope, generated using gpu_burn [76] at high utilization. Design implication: XW-Slcs use these lookup tables to pick GPU… view at source ↗
Figure 7
Figure 7. Figure 7: A Greenferencing site of [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Power availability across 3 sites scaled from real US wind farm data. Site-0 experiences a sustained ∼50% power drop mid-trace. Note: 𝑦-axis does not start at 0. (queue length, KV-cache utilization) metrics. A lightweight Instance Telemetry process computes sliding-window aver￾ages (window=15 s, step=1 s) and exposes state to the XW￾Slc, which polls every 15 s to construct site-wide aggrega￾tions for XWind… view at source ↗
Figure 9
Figure 9. Figure 9: P99 E2E latency (top) and P99 queue time (bottom) across site-local logic variants for coding and conversation workloads. Note: 𝑦-axis is log-scale [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: CDF (tail) of E2E latency for conversation trace, 175 RPS, for different site-local logic [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: CDF of E2E latency for different routing strategies using the same reactive XW-Slc (coding, 150 RPS). with the same reactive XW-Slc at each site, on the coding workload at 150 RPS ( [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up. Grid expansion comes with high capital expenditure and long-distance transmission losses, yet there is abundant renewable energy at the source, just not matched to demand. This paper proposes a complementary AI infrastructure deployment model, AI Greenferencing, that brings modular AI compute to renewable energy sources, focusing on wind, allowing AI footprint expansion, generating local behind-the-meter demand for renewable sites, and helping ease the growing strain on power utilities. Our feasibility analysis shows that 890+ GW of wind capacity lies within 50 ms network round trip time of Azure data centers, and that site-wise right-sizing combined with spatial complementarity of wind energy keeps aggregate fleet utilization on par with traditional deployments. To serve inference requests under variable wind power, we build XWind, a lightweight, reactive, and workload-agnostic AI inference router that uses only real-time signals: inference latency, KV-cache utilization, and queue depth, to dynamically configure sites and distribute requests. Evaluated on a real 64-GPU A100 testbed emulating three wind-powered sites with Azure production traces, XWind reduces P99 end-to-end latency by up to 52% over the strongest contender (also our idea) and by up to 98% over baselines such as power-capping and GPU idling, with consistent gains across workload types, load levels, and GPU generations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an 'AI Greenferencing' deployment model that colocates modular LLM inference compute at wind energy farms to create behind-the-meter demand for renewables. It reports a feasibility analysis showing 890+ GW of wind capacity within 50 ms RTT of Azure data centers, with site right-sizing and spatial complementarity preserving fleet utilization. It introduces XWind, a lightweight reactive router that uses only real-time signals (inference latency, KV-cache utilization, queue depth) to configure sites and route requests. On a 64-GPU A100 testbed emulating three wind-powered sites driven by Azure production traces, XWind is claimed to reduce P99 end-to-end latency by up to 52% versus the strongest alternative and up to 98% versus baselines such as power-capping and GPU idling, with gains holding across workload types, load levels, and GPU generations.

Significance. If the reported latency reductions prove robust under realistic wind variability and network conditions, the work could enable more sustainable scaling of inference serving by directly coupling compute to renewable sources. The use of a real 64-GPU hardware testbed with production traces and the workload-agnostic design of the router are concrete strengths that distinguish the contribution from purely simulation-based studies.

major comments (2)
  1. [Abstract and Evaluation section] Abstract and Evaluation section: The headline claims of 52% and 98% P99 latency reduction rest entirely on the 64-GPU A100 testbed emulation of three wind sites; however, the manuscript provides no quantitative validation (e.g., statistical comparison of injected power traces to measured wind-farm output, measured inter-site RTT distributions, or sensitivity to site right-sizing assumptions) that the emulation faithfully reproduces real wind variability or the 50 ms RTT deployment model. This is load-bearing for the central empirical result.
  2. [Evaluation section] Evaluation section: The abstract states that gains are 'consistent across workload types, load levels, and GPU generations' yet supplies no error bars, exact workload definitions, exclusion criteria, or per-configuration raw data; without these, it is impossible to determine whether the reported percentages are robust or sensitive to post-hoc choices in the experimental setup.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'also our idea' for the strongest contender is unclear without a citation or prior reference; a brief pointer to the relevant prior work or section would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments highlight areas where additional detail on our emulation methodology and experimental reporting would strengthen the manuscript. We address each point below and will incorporate revisions to improve clarity and robustness.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The headline claims of 52% and 98% P99 latency reduction rest entirely on the 64-GPU A100 testbed emulation of three wind sites; however, the manuscript provides no quantitative validation (e.g., statistical comparison of injected power traces to measured wind-farm output, measured inter-site RTT distributions, or sensitivity to site right-sizing assumptions) that the emulation faithfully reproduces real wind variability or the 50 ms RTT deployment model. This is load-bearing for the central empirical result.

    Authors: We agree that the current manuscript lacks explicit quantitative validation of the emulation fidelity. The power traces were derived from public wind resource datasets and scaled to match the spatial complementarity analysis in Section 3, while RTT values were obtained from Azure region-to-wind-farm latency measurements. In the revised version we will add (1) a new subsection detailing the trace generation pipeline with Kolmogorov-Smirnov statistics comparing synthetic versus measured wind output distributions, (2) measured RTT histograms from the three emulated sites, and (3) sensitivity plots showing P99 latency under ±20% perturbations to site right-sizing and RTT. These additions will be placed in the Evaluation section and referenced from the abstract. revision: yes

  2. Referee: [Evaluation section] Evaluation section: The abstract states that gains are 'consistent across workload types, load levels, and GPU generations' yet supplies no error bars, exact workload definitions, exclusion criteria, or per-configuration raw data; without these, it is impossible to determine whether the reported percentages are robust or sensitive to post-hoc choices in the experimental setup.

    Authors: The manuscript currently reports only aggregate P99 numbers without per-run statistics or full workload specifications. We will revise the Evaluation section to include: (a) exact definitions of the three workload classes (with token-length distributions and arrival-rate parameters drawn from the Azure traces), (b) error bars showing standard deviation across five independent runs per configuration, (c) a table of all tested load levels and GPU generations with the corresponding P99 values, and (d) a brief statement of exclusion criteria (e.g., warm-up period and outlier removal). Raw per-run data will be released in the artifact repository. These changes directly address the concern about post-hoc sensitivity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical testbed evaluation with no derivations or self-referential fitting

full rationale

The paper contains no equations, derivations, or mathematical models. All claims rest on direct measurements from a 64-GPU A100 testbed that emulates three wind-powered sites using Azure traces. The router logic is described as reactive to observable signals (latency, KV-cache, queue depth) without any fitted parameters or predictions that reduce to inputs by construction. Feasibility numbers on wind capacity and RTT are presented as analysis results rather than derived quantities. This is a standard empirical systems paper whose central results are externally falsifiable via the described testbed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a new router and deployment concept; the main additions are the XWind design and the feasibility numbers. No free parameters are visible from the abstract. The 890 GW figure and 50 ms RTT bound rest on external data sources not detailed here.

axioms (1)
  • domain assumption 890+ GW of wind capacity lies within 50 ms network round trip time of Azure data centers
    Stated directly in the abstract as the outcome of a feasibility analysis.
invented entities (1)
  • XWind router no independent evidence
    purpose: Dynamically configure sites and distribute inference requests using only real-time signals
    New system proposed and evaluated in the paper

pith-pipeline@v0.9.0 · 5847 in / 1374 out tokens · 45369 ms · 2026-05-25T03:21:46.396812+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages

  1. [1]

    DCGMI.https://microsoft.github.io/VirtualClient/docs/ workloads/dcgmi/

    2025. DCGMI.https://microsoft.github.io/VirtualClient/docs/ workloads/dcgmi/

  2. [2]

    Microsoft & Constellation’s Bid to Restart Three Mile Island.https://datacentremagazine.com/critical-environments/ microsoft-constellation-restarting-a-nuclear-reactor

    2025. Microsoft & Constellation’s Bid to Restart Three Mile Island.https://datacentremagazine.com/critical-environments/ microsoft-constellation-restarting-a-nuclear-reactor. 20-year, $16B deal for 835 MW dedicated to Microsoft AI DCs. Accessed: 2026-04-02

  3. [3]

    Nuclear power for AI: inside the data center en- ergy deals.https://introl.com/blog/nuclear-power-ai-data-centers- microsoft-google-amazon-2025

    2025. Nuclear power for AI: inside the data center en- ergy deals.https://introl.com/blog/nuclear-power-ai-data-centers- microsoft-google-amazon-2025. Amazon investing >$20B in nuclear- adjacent DC sites. Accessed: 2026-04-02

  4. [4]

    Soluna.https://www.solunacomputing.com/

    2025. Soluna.https://www.solunacomputing.com/. 12

  5. [5]

    2025. Starcloud Launches Orbital AI Data Center With NVIDIA H100 GPU.https://www.datacenterfrontier.com/site- selection/article/55337494/starcloud-launches-orbital-ai-data- center-with-nvidia-h100-gpu. First orbital AI DC, Nov 2025, LLM inference in orbit. Accessed: 2026-04-02

  6. [6]

    2025. US Approves $1B Loan to Restart Three Mile Island for Microsoft Data Centers.https://gizmodo.com/us-approves-1b- loan-to-restart-three-mile-island-as-microsoft-data-centers-drive- demand-2000688138. DOE $1B loan, targeting 2027 restart. Accessed: 2026-04-02

  7. [7]

    WestfalenWIND.https://www.westfalenwind.de/

    2025. WestfalenWIND.https://www.westfalenwind.de/

  8. [8]

    windCORES.https://www.windcores.de/en/

    2025. windCORES.https://www.windcores.de/en/

  9. [9]

    2026. Crusoe Announces New 900 MW AI Factory Campus in Abilene, Texas for Microsoft.https://www.cxodigitalpulse.com/crusoe- announces-new-900-mw-ai-factory-campus-in-abilene-texas-to- support-microsoft-ai-infrastructure/. Behind-the-meter natural gas, 1.2 GW campus. Accessed: 2026-04-02

  10. [10]

    Powering the Intelligence Age: Bloom Energy and Wyoming 1.8 GW AI Data Center Project.https://markets.chroniclejournal

    2026. Powering the Intelligence Age: Bloom Energy and Wyoming 1.8 GW AI Data Center Project.https://markets.chroniclejournal. com/chroniclejournal/article/marketminute-2026-1-8-powering- the-intelligence-age-bloom-energy-shares-surge-as-wyoming- approves-massive-18-gw-ai-data-center-project. 1 GW solid-oxide fuel cells for behind-the-meter DC power. Acces...

  11. [11]

    vLLM.https://docs.vllm.ai/en/latest/

    2026. vLLM.https://docs.vllm.ai/en/latest/

  12. [12]

    Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, and Carole-Jean Wu. 2023. Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM

  13. [13]

    Anup Agarwal, Jinghan Sun, Shadi Noghabi, Srinivasan Iyengar, Anirudh Badam, Ranveer Chandra, Srinivasan Seshan, and Shivkumar Kalyanaraman. 2021. Redesigning data centers for renewable energy. InACM HotNets

  14. [15]

    Wright, and Yong Chen

    Ghazanfar Ali, Mert Side, Sridutt Bhalachandra, Nicholas J. Wright, and Yong Chen. 2023. Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models. InProceedings of the 52nd International Conference on Parallel Processing (ICPP). 433–442. https://doi.org/10.1145/3605573.3605600

  15. [16]

    it’s super fun seeing people love images in chatgpt. but our GPUs are melting

    Sam Altman. 2025. “it’s super fun seeing people love images in chatgpt. but our GPUs are melting”.https://x.com/sama/status/ 1905296867145154688. Accessed: 2025-06-15

  16. [17]

    Amperon. 2024. US Solar and Wind Curtailment Is Explod- ing.https://www.amperon.co/blog/us-solar-and-wind-curtailment- is-exploding. Estimated 20 TWh curtailed in US in 2024. Accessed: 2026-04-02

  17. [18]

    Azure. 2025. Azure Modular Data Center (MDC) Operator and User Documentation.https://learn.microsoft.com/en-us/azure-stack/mdc/

  18. [19]

    AzurePublicDataset. 2025. Azure LLM inference trace 2024. https://github.com/Azure/AzurePublicDataset/blob/master/ AzureLLMInferenceDataset2024.md

  19. [20]

    2024.The rapid adoption of generative AI

    Alexander Bick, Adam Blandin, and David J Deming. 2024.The rapid adoption of generative AI. Technical Report. National Bureau of Eco- nomic Research

  20. [21]

    David Chernicoff. 2024. How Data Centers Are Harnessing AI Work- loads for Enhanced Cloud, LLM, and Inference Capabilities.Data Center Frontier(2024).https://rb.gy/31u0niAccessed: 2025-06-15

  21. [22]

    CoreSite. 2024. AI and the Data Center: Driving Greater Power Den- sity.https://www.coresite.com/blog/ai-and-the-data-center-driving- greater-power-density

  22. [23]

    Casey Crownhart. 2024. Why Microsoft made a deal to help restart Three Mile Island.https://www.technologyreview.com/2024/09/26/ 1104516/three-mile-island-microsoft/

  23. [24]

    Databank. 2024. Exploring Modular Data Centers: Benefits, De- sign, And Deployment.https://www.databank.com/resources/blogs/ exploring-modular-data-centers

  24. [25]

    DCSMI. 2024. Data Center Workloads, Hyperscale Utilization Rates, and AI GPU Impact.https://www.dcsmi.com/blog/data-center- workloads-hyperscale-utilization-rates-and-ai-gpu-impact. GPU clusters 70–90% utilization. Accessed: 2026-04-04

  25. [26]

    Tim De Chant. 2024. Google kicks off $20B renewable energy building spree to power AI.https://techcrunch.com/2024/12/10/google-kicks- off-20b-renewable-energy-building-spree-to-power-ai/

  26. [27]

    Delta Power Solutions. 2024. Modular Data Centers: The Rise and the Advantages.https://www.deltapowersolutions.com/en- in/mcis/technical-article-modular-data-centers-the-rise-and-the- advantages.php

  27. [28]

    Diana DiGangi. 2023. Dominion Energy projects adding up to 9 GW of gas-fired capacity in Virginia to bolster reliability.https://tinyurl. com/4fbcdx24

  28. [29]

    Edison Electric Institute. 2025. Electric Companies to Invest Nearly $208B in 2025 to Strengthen Grid and Drive Economic Growth.https://www.eei.org/en/news/news/all/electric-companies- to-invest-nearly-208b-in-2025-to-strengthen-grid-and-drive- economic-growth. Record annual grid investment. Accessed: 2026-04-02

  29. [30]

    EIA. 2024. Why are Midwest grid operators turning away wind power? https://www.eia.gov/todayinenergy/detail.php?id=62406

  30. [31]

    Elia. 2025. Wind power generation.https://www.elia.be/en/grid- data/generation-data/wind-power-generation

  31. [32]

    Robert L Fares and Carey W King. 2017. Trends in transmission, distribution, and administration costs for US investor-owned electric utilities.Energy Policy(2017)

  32. [33]

    Financial Times. 2026. Microsoft vows to ‘pay its way’ as it seeks to defuse data centre backlash.Financial Times(Jan. 2026).https://www. ft.com/content/3f392c9b-c07d-42f5-b000-0a7347ad1ec0Accessed: 2026-03-09

  33. [34]

    GE Vernova. 2025. Going Big: To Support Data Center Growth and Rising Renewables, Crusoe Ordering Flexible Gas Turbines. https://www.gevernova.com/news/articles/going-big-support-data- center-growth-rising-renewables-crusoe-ordering-flexible-gas. 29 LM2500XPRESS turbines, 1 GW. Accessed: 2026-04-02

  34. [35]

    GitHub. 2025. GitHub Copilot.https://github.com/features/copilot

  35. [36]

    Global Energy Monitor. 2026. Global Wind Power Tracker.https: //globalenergymonitor.org/projects/global-wind-power-tracker/

  36. [37]

    Íñigo Goiri, William Katsak, Kien Le, Thu D Nguyen, and Ricardo Bianchini. 2013. Parasol and greenswitch: Managing datacenters powered by renewable energy.ACM SIGPLAN Notices(2013)

  37. [38]

    Google. 2024. New nuclear clean energy agreement with Kairos Power.https://blog.google/company-news/outreach-and-initiatives/ sustainability/google-kairos-power-nuclear-energy-agreement/. First corporate SMR fleet deal, up to 500 MW by 2030–2035. Accessed: 2026-04-02

  38. [39]

    Diana Goovaerts and Matt Hamblen. 2024. Could GPU power levels break the data center ecosystem?https://www.fierce-network.com/ cloud/could-gpu-power-levels-break-data-center-ecosystem

  39. [40]

    Sriram Govindan, Anand Sivasubramaniam, and Bhuvan Urgaonkar

  40. [41]

    InProceedings of the 38th Annual International Symposium on Computer Architecture (ISCA)

    Benefits and Limitations of Tapping into Stored Energy for Datacenters. InProceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). ACM, 341–352

  41. [42]

    Hanafy, Qianlin Liang, Noman Bashir, David Irwin, and Prashant Shenoy

    Walid A. Hanafy, Qianlin Liang, Noman Bashir, David Irwin, and Prashant Shenoy. 2023. CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency.Proceedings of the ACM 13 on Measurement and Analysis of Computing Systems (POMACS)7, 3 (2023)

  42. [43]

    Md E Haque, IŽigo Goiri, Ricardo Bianchini, and Thu D Nguyen. 2015. Greenpar: Scheduling parallel high performance applications in green datacenters. InACM ICS

  43. [44]

    Astrid Hennevogl-Kaulhausen and Ulrike Ostler. 2024. Modular data centers: Faster, more flexible and more energy-efficient in the data centers.https://www.deltapowersolutions.com/en-in/mcis/technical- article-modular-data-centers-faster-more-flexible-and-more- energy-efficient-in-the-data-centers.php

  44. [45]

    Hernandez

    Javier C. Hernandez. 2017. It Can Power a Small Nation. But This Wind Farm in China Is Mostly Idle.https://www.nytimes.com/2017/ 01/15/world/asia/china-gansu-wind-farm.html

  45. [46]

    Bobby Hollis. 2024. Accelerating the addition of carbon-free energy: An update on progress.https://www.microsoft.com/en-us/microsoft- cloud/blog/2024/09/20/accelerating-the-addition-of-carbon-free- energy-an-update-on-progress/

  46. [47]

    International Energy Agency. 2025. Energy and AI: Energy Demand from AI.https://www.iea.org/reports/energy-and-ai/energy-demand- from-ai. Global DC demand 415 TWh (2024), projected 945 TWh by

  47. [48]

    Accessed: 2026-04-02

  48. [49]

    Peeyush Kumar, Ranveer Chandra, Chetan Bansal, Shivkumar Kalya- naraman, Tanuja Ganu, and Michael Grant. 2021. Micro-climate prediction-multi scale encoder-decoder based deep learning frame- work. InACM SIGKDD

  49. [50]

    Lawrence Berkeley National Laboratory. 2025. Queued Up: 2025 Edi- tion — Characteristics of Power Plants Seeking Transmission Intercon- nection.https://emp.lbl.gov/queues. 2,300 GW in queue (end 2024), 13–19% completion rate, 4.5–5 yr median wait. Accessed: 2026-04-02

  50. [51]

    Lazard. 2025. Levelized Cost of Energy+ (LCOE+).https://www. lazard.com/media/eijnqja3/lazards-lcoeplus-june-2025.pdf. June 2025 edition. Accessed: 2026-04-02

  51. [52]

    Vivian Lee. 2024. U.S. Data Center Power Outlook: Balancing compet- ing power consumption needs.https://www.linkedin.com/pulse/us- data-center-power-outlook-balancing-competing-consumption- lee-iz4pe/

  52. [53]

    Bryan Lim, Nicolas Loeff, Sercan Arik, and Tomas Pfister. 2021. Tempo- ral Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

  53. [54]

    McKinsey & Company. 2024. The role of power in unlocking the European AI revolution.https://tinyurl.com/bdf952sr

  54. [55]

    McKinsey & Company. 2025. The next big shifts in AI workloads and hyperscaler strategies.https://www.mckinsey.com/industries/ technology-media-and-telecommunications/our-insights/the-next- big-shifts-in-ai-workloads-and-hyperscaler-strategies. Accessed: 2026-04-04

  55. [56]

    Meta. 2024. Llama 3.1.https://ai.meta.com/blog/meta-llama-3-1/

  56. [57]

    Modo Energy. 2024. The Curtailment Crisis: Saving Wind and Solar Investments in ERCOT.https://modoenergy.com/research/en/ercot- curtailment-crisis-solar-wind-data-battery-colocated-trends-maps- texas. Over 8 TWh curtailed in ERCOT in 2024. Accessed: 2026-04-02

  57. [58]

    National Renewable Energy Laboratory. 2025. Cost of Wind Energy Review: 2024 Edition.https://docs.nrel.gov/docs/fy25osti/91775.pdf. Onshore wind LCOE 2.6–5.4 cents/kWh, PPA 2.3–4.5 cents/kWh. Ac- cessed: 2026-04-02

  58. [59]

    NVIDIA. 2023. NVIDIA DGX SuperPOD Data Center De- sign.https://docs.nvidia.com/nvidia-dgx-superpod-data-center- design-dgx-h100.pdf

  59. [60]

    NVIDIA. 2025. NVIDIA Data Center GPUs.https://www.nvidia.com/ en-in/data-center/data-center-gpus/

  60. [61]

    Dylan Patel, Daniel Nishball, and Jeremie Eliahou Ontiveros

  61. [62]

    AI Datacenter Energy Dilemma – Race for AI Datacenter Space.https://semianalysis.com/2024/03/13/ai-datacenter-energy- dilemma-race/#datacenter-math

  62. [63]

    Ana Radovanovic, Ross Koningstein, Ian Schneider, Bokan Chen, Alexandre Duarte, Binz Roy, Diyue Xiao, Maya Haridasan, Patrick Hung, Nick Care, Saurav Talukdar, Eric Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne. 2021. Carbon-Aware Com- puting for Datacenters. arXiv:2106.11750 [cs.DC]

  63. [64]

    Chuangang Ren, Di Wang, Bhuvan Urgaonkar, and Anand Sivasub- ramaniam. 2012. Carbon-Aware Energy Capacity Planning for Data- centers. InProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommuni- cation Systems (MASCOTS). IEEE, 391–400.https://doi.org/10.1109/ MASCOTS.2012.51

  64. [65]

    Reuters. 2025. Amazon CEO sets out AI investment mission in annual shareholder letter.https://rb.gy/emuya0Accessed: 2025-06-16

  65. [66]

    Reuters. 2025. Ghibli effect: ChatGPT usage hits record after rollout of viral feature.https://rb.gy/vx2m0nAccessed: 2025-06-16

  66. [67]

    Martin Rosenberg. 2024. Evergy Struggles to Cut Carbon as Energy Demand Soars.https://tinyurl.com/mtw7rmj6

  67. [68]

    James Sanders. 2025. Tech Billionaires Race to Build AI Data Centers in Space.https://www.techrepublic.com/article/news-ai-data-centers- space-race/. SpaceX Starlink V3, Google Suncatcher. Accessed: 2026- 04-02

  68. [69]

    Abel Souza, Noman Bashir, Jorge Murillo, Walid Hanafy, Qianlin Liang, David Irwin, and Prashant Shenoy. 2023. Ecovisor: A Virtual Energy System for Carbon-Efficient Applications. InProceedings of the 28th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems (ASPLOS). ACM

  69. [70]

    Stanford Institute for Human-Centered AI. 2025. Artificial Intelligence Index Report 2025.arXiv preprint arXiv:2504.07139(2025)

  70. [71]

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. 2025. Dynamollm: Designing llm inference clusters for performance and energy efficiency. InIEEE HPCA

  71. [72]

    Jinghan Sun, Zibo Gong, Anup Agarwal, Shadi Noghabi, Ranveer Chandra, Marc Snir, and Jian Huang. 2024. Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale. InACM SoCC

  72. [73]

    Dan Swinhoe. 2026. Nvidia, Prologis, EPRI, InfraPartners target prefab data centers at substation sites.https://www.datacenterdynamics. com/en/news/nvidia-prologis-epri%2Dinfrapartners-target-prefab- data-centers-at-substation-sites/

  73. [74]

    TechInAsia. 2026. China turns to underwater data centers to fuel AI boom.https://www.techinasia.com/news/china-turns-underwater- data-centers-ai

  74. [75]

    Thunder Said Energy. 2024. US electric utilities: transmission and distribution costs?https://thundersaidenergy.com/downloads/us- electric-utilities-transmission-and-distribution-costs/

  75. [76]

    of Energy

    US Dept. of Energy. 2022. Advantages and Challenges of Wind En- ergy.https://www.energy.gov/eere/wind/advantages-and-challenges- wind-energy

  76. [77]

    Energy Information Administration

    U.S. Energy Information Administration. 2024. Solar and wind power curtailments are increasing in California.https://www.eia. gov/todayinenergy/detail.php?id=65364. California curtailment up 29% YoY to 3,400 GWh. Accessed: 2026-04-02

  77. [78]

    Energy Information Administration

    U.S. Energy Information Administration. 2026. Electric Power Monthly — Average Retail Price of Electricity.https://www.eia.gov/ electricity/monthly/epm_table_grapher.php?t=epmt_5_3. Industrial: 9.3 cents/kWh, Commercial: 13.6 cents/kWh (Jan 2026). Accessed: 2026-04-02

  78. [79]

    Ville-Pekka Vainio. 2024. gpu-burn: Multi-GPU CUDA stress test. https://github.com/wilicc/gpu-burn. Accessed: 2026-02-03

  79. [80]

    Wikipedia. 2025. Wind power in the United Kingdom: Constraint pay- ments.https://en.wikipedia.org/wiki/Wind_power_in_the_United_ Kingdom?#Constraint_payments. 14

  80. [81]

    Weihang Xian, Phuong Nguyen, and Lieven Eeckhout. 2025. Using Analytical Performance/Power Model and Fine-Grained DVFS to En- hance AI Accelerator Energy Efficiency. InProceedings of the 30th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems (ASPLOS).https://doi.org/10. 1145/3669940.3707231

Showing first 80 references.