LaissezCloud: Continuous Resource Renegotiation for the Public Cloud

Antoine Kaufmann; Tejas Harith

arxiv: 2604.22509 · v1 · submitted 2026-04-24 · 💻 cs.DC

LaissezCloud: Continuous Resource Renegotiation for the Public Cloud

Tejas Harith , Antoine Kaufmann This is my paper

Pith reviewed 2026-05-08 09:44 UTC · model grok-4.3

classification 💻 cs.DC

keywords cloud resource managementcontinuous biddingdynamic allocationspot instancesincentive alignmentheterogeneous hardwareoversubscribed clusters

0 comments

The pith

LaissezCloud keeps cloud resource allocations continuously contestable through online bids so tenants retain them only while outbidding others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LaissezCloud as a management platform that replaces rigid on-demand and spot categories with ongoing renegotiation of running allocations. Tenants and operators update bids in real time, and a tenant holds a resource only as long as its bid exceeds competing demand. Pricing functions as the single interface that lets tenants signal workload value while operators encode constraints such as power or carbon limits. This setup matters for oversubscribed clusters with heterogeneous accelerators because current models force either fixed commitments or sudden preemption, leading to unnecessary performance loss when demands shift. If the approach succeeds, clouds could handle time-varying objectives and operator goals without any party revealing internal application or infrastructure details.

Core claim

LaissezCloud enables continuous re-negotiation of running allocations by having tenants and operators update bids online during execution. A tenant retains a resource only as long as its bid exceeds competing demand. The pricing mechanism serves as a narrow waist that aligns incentives between untrusted parties: tenants signal utility via bids, operators encode constraints like power or carbon without exposing telemetry. Across accelerator workloads the approach reduces performance degradation under contention by 8-23 percent versus on-demand and spot baselines and scales to clusters of at least 10,000 nodes.

What carries the argument

Continuous online bidding with pricing as the narrow waist for incentive alignment between tenants and operators.

Load-bearing premise

Continuous online bid updates can be performed efficiently and the pricing mechanism aligns incentives between untrusted tenants and operators without exposing internal states.

What would settle it

An experiment on a contended multi-tenant cluster showing either that bid-update overhead exceeds 5 percent of runtime or that performance degradation does not drop below the on-demand and spot baselines.

Figures

Figures reproduced from arXiv: 2604.22509 by Antoine Kaufmann, Tejas Harith.

**Figure 1.** Figure 1: FCFS: App A acquires HW1 first, forcing App B, dispatched at 0.1, to take its second-ranked hardware. FCFS-P: App A acquires HW1 first, but App B, dispatched at 0.1, preempts App A, interrupting an epoch between checkpoints. 2.2 An Illustrative Cloud Example To make this concrete, consider two applications running on a small cloud with three machines of different hardware types. App A is a checkpointed t… view at source ↗

**Figure 2.** Figure 2: App A acquires HW1 first and App B acquires HW2 on dispatch, but B raises its willingness to pay for HW1 over time. The higher price eventually induces A to migrate after reaching a checkpoint, avoiding wasted work before moving the resource to the tenant that values it more. during execution. On-demand instances freeze that decision too early, while spot-style preemption makes allocations reclaimable but… view at source ↗

**Figure 3.** Figure 3: LaissezCloud overview (components highlighted). 4.1 Overview LaissezCloud ( view at source ↗

**Figure 5.** Figure 5: Each resource instance’s order book is a leaf in one of the type-specific trees for the hierarchical topology. Order books on inner nodes aggregate orders in the books below. LaissezCloud exposes prices through the hierarchy at the level relevant to a tenant’s current decisions. The instance API in view at source ↗

**Figure 6.** Figure 6: Performance retained for clusters of various composition. We use these normalized metrics only to compare cloud interfaces across heterogeneous workloads, not to claim that the underlying application objectives are identical. 5.2 Allocation Efficiency Under Contention LaissezCloud improves allocation quality across all contention regimes in view at source ↗

**Figure 7.** Figure 7: LaissezCloud lets tenants react to live prices and shift to cheaper allocations when urgency is low. 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 Norm Perf Retention Tenant Cost ($) Spot On Demand ours 1314 15 16 17 20 2225 view at source ↗

**Figure 8.** Figure 8: LaissezCloud lets tenants navigate a broad costperformance frontier between spot-like and on-demand-like behavior. Budgets provided to the app are provided next to LaissezCloud points. 5.3 Tenant Outcomes Beyond improving average allocation quality, LaissezCloud improves tenant outcomes in two ways. It lets tenants adjust cost-performance trade-offs online as prices and urgency change, and it lets topolog… view at source ↗

**Figure 11.** Figure 11: LaissezCloud lets an InfraMaps policy steer load away from a power-constrained row using prices alone. 5.4 Operator-Side Control InfraMaps give operators a soft control lever for steering demand without exposing raw infrastructure telemetry or selecting victims directly. This matters because operator-side constraints such as power, cooling, maintenance, or congestion often change faster than instance lif… view at source ↗

**Figure 14.** Figure 14: Excess volatility induces churn, while overly constrained prices approach FCFS-like inefficiency; a middle ground performs best. this trade-off. LaissezCloud can regulate upward volatility by clipping incoming bids relative to the current price and regulate downward volatility by bounding how quickly the operator’s floor price falls. These controls also prevent a common failure mode around newly freed nod… view at source ↗

**Figure 13.** Figure 13: Lower reconfiguration overhead enables more beneficial exchanges, while high overhead pushes LaissezCloud back toward FCFS-like behavior. Reconfiguration Overhead Reconfiguration cost is the main counterforce to continuous renegotiation, so we vary it by applying a uniform multiplier to all tenant overheads. Our baseline overheads come from the representative systems in view at source ↗

**Figure 15.** Figure 15: Underestimating reconfiguration overhead hurts more than overestimating it, although LaissezCloud tolerates small errors of about ±5%. 1 type NodeSpec struct { // Describes the desired node to add or remove. 2 NodeType string 3 Locality string 4 RelTo string 5 Meta map[string]any 6 } 7 8 // Pricing logic called by EconAdapter on every add, remove and market-update 9 func Price (n EA.NodeSpec, b EA.OrderBo… view at source ↗

read the original abstract

Public clouds increasingly expose heterogeneous hardware, but their allocation interface remains built around rigid on-demand and spot service classes. This makes it hard to satisfy time-varying tenant objectives and operator constraints in oversubscribed, heterogeneous clusters without exposing internal application or infrastructure state. We present LaissezCloud, a cloud resource management platform for continuous re-negotiation of running allocations. Unlike spot instances, which use launch-time bids and unilateral preemption, LaissezCloud keeps allocations continuously contestable during execution: tenants and operators update bids online, and a running tenant keeps a resource only as long as its bid exceeds competing demand. Pricing serves both as a narrow waist and as an incentive-alignment mechanism between mutually untrusted participants: tenants express utility through bids, while operators price in power, cooling, or carbon constraints without exposing internal telemetry. Across a diverse set of accelerator workloads, LaissezCloud reduces performance degradation under contention by 8-23% versus on-demand and spot baselines, and scales to clusters of at least 10,000 nodes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LaissezCloud adds continuous bidding to cloud allocation but the unmeasured update overhead is the key open question.

read the letter

The paper's main point is a resource system that keeps allocations open to renegotiation while jobs run. Tenants and operators send updated bids online, and a tenant holds a resource only while its bid beats the competition. Pricing acts as the narrow interface that lets both sides express goals without sharing internal details like application state or power telemetry. This differs from spot instances, which fix the bid at launch and let the operator preempt unilaterally. The reported outcome is an 8-23% drop in performance loss under contention across accelerator workloads, plus a scaling claim to at least 10,000 nodes. Those numbers are the concrete evidence the authors put forward. The framing of the problem is clear: current on-demand and spot classes are too rigid for time-varying objectives in heterogeneous, oversubscribed clusters. The incentive-alignment angle through prices is a reasonable way to avoid exposing internals. The evaluation covers a range of workloads, which helps show the idea is not tied to one narrow case. The soft spot is the missing data on bid-update cost. The abstract and claims give no figures for update frequency, message size, auction latency, or per-node overhead in the large-cluster regime. If those costs are non-trivial, they could shrink or remove the 8-23% net gain once bidding traffic replaces simple contention. The scaling statement to 10k nodes rests on the same unquantified premise. This paper is aimed at researchers who work on cloud schedulers and market mechanisms. A reader who already thinks about incentive-compatible allocation will see a new concrete mechanism and can judge whether the overhead gap matters for their setting. It is coherent on its own terms and the central idea is falsifiable, so it deserves a serious referee. I would send it to review and ask specifically for overhead measurements and a clearer breakdown of how the auction and state are kept lightweight at scale.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces LaissezCloud, a cloud resource management platform enabling continuous renegotiation of running allocations via online bid updates from tenants and operators. Pricing functions as a narrow waist for allocation decisions and incentive alignment between untrusted parties without exposing internal application or infrastructure state. The central empirical claims are an 8-23% reduction in performance degradation under contention versus on-demand and spot baselines across diverse accelerator workloads, together with demonstrated scaling to clusters of at least 10,000 nodes.

Significance. If the reported gains prove robust once bid-update overhead is quantified and the evaluation methodology is fully documented, the work would offer a substantive contribution to distributed systems and cloud computing by replacing rigid service classes with a continuously contestable, incentive-compatible allocation model. The empirical evaluation across workloads and the scaling result to 10k nodes are strengths that, if substantiated, would support broader adoption of pricing-mediated renegotiation.

major comments (2)

[Abstract] Abstract: the performance claims (8-23% reduction in degradation and scaling to 10,000 nodes) are presented without any description of the evaluation methodology, workload characteristics, contention levels, measurement of bid-update frequency/latency, or per-node overhead. This absence directly undermines assessment of whether the data support the headline results, especially since continuous bidding traffic could offset the reported gains.
[Abstract] Abstract: the central assumption that repeated online bid updates incur negligible cost relative to the workloads is load-bearing for both the performance improvement and the 10k-node scaling claim, yet the manuscript provides no measurements of message size, auction latency, consensus cost, or aggregate communication overhead in the contended regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract requires more context on methodology and overhead to substantiate the claims, and we have revised it accordingly while adding explicit overhead measurements to the evaluation section.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claims (8-23% reduction in degradation and scaling to 10,000 nodes) are presented without any description of the evaluation methodology, workload characteristics, contention levels, measurement of bid-update frequency/latency, or per-node overhead. This absence directly undermines assessment of whether the data support the headline results, especially since continuous bidding traffic could offset the reported gains.

Authors: We agree that the abstract is too terse and should briefly outline the evaluation to allow assessment of the claims. In the revised manuscript we have expanded the abstract to note the diverse accelerator workloads, contention scenarios, and that bid-update overhead was measured and remains low relative to gains (with pointers to Sections 4-6 for full methodology). revision: yes
Referee: [Abstract] Abstract: the central assumption that repeated online bid updates incur negligible cost relative to the workloads is load-bearing for both the performance improvement and the 10k-node scaling claim, yet the manuscript provides no measurements of message size, auction latency, consensus cost, or aggregate communication overhead in the contended regime.

Authors: We accept that dedicated measurements of bid-update overhead were insufficiently documented. While the scaling experiments implicitly incorporate communication costs, we have added an explicit subsection (5.3) with measurements of message sizes, auction latency, consensus costs, and aggregate overhead under contention. These confirm the overhead is small enough not to offset the reported gains or the 10k-node scaling. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system claims rest on evaluation, not self-referential derivations

full rationale

The paper describes a systems platform for continuous bid-based renegotiation and reports measured improvements (8-23% lower degradation, scaling to 10k nodes) from workload experiments. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Performance numbers are presented as direct experimental outcomes rather than outputs derived from the inputs by construction. The design uses pricing as an incentive mechanism, but this is an architectural choice justified by stated goals, not a tautological reduction. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper's claims rely on domain assumptions about the feasibility of continuous bidding and the effectiveness of pricing for coordination in untrusted settings, without providing independent evidence for these in the abstract.

axioms (2)

domain assumption Participants are mutually untrusted and will not expose internal state.
Stated in abstract as key to the design.
domain assumption Bids can be updated online without disrupting running allocations.
Core to continuous renegotiation.

invented entities (1)

LaissezCloud platform no independent evidence
purpose: To enable continuous resource renegotiation
The proposed system itself.

pith-pipeline@v0.9.0 · 5473 in / 1410 out tokens · 72253 ms · 2026-05-08T09:44:06.273365+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

[1]

AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

ai-dynamo contributors. AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

work page 2025
[2]

Praneet Arshi and Joel Miller. Our approach to carbon- aware data centers: Central data center fleet manage- ment.https://cloud.google.com/blog/topics/sustainability/ googles-approach-to-carbon-aware-data-center, September

work page
[3]

Google Cloud Blog post

work page
[4]

scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference

AWS Samples. scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference. GitHub repository for hardware-agnostic inference on mixed accelerators

work page
[5]

Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage

Hamid Hajabdolali Bazzaz, Yingjie Bi, Weiwu Pang, Minlan Yu, Ramesh Govindan, Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, Yuxue Jin, Charles Carver, Jan Kopański, Liqun Cheng, and Amin Vahdat. Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage. In22nd USENIX Sympos...

work page
[6]

Cilantro: Performance-Aware resource allocation for general objectives via online feedback

Romil Bhardwaj, Kirthevasan Kandasamy, Asim Biswal, Wenshuo Guo, Benjamin Hindman, Joseph Gonzalez, Michael Jordan, and Ion Stoica. Cilantro: Performance-Aware resource allocation for general objectives via online feedback. In17th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 23), pages 623–643, Boston, MA, July 2023. USENIX Association

work page 2023
[7]

Eva: Cost-efficient cloud- based cluster scheduling

Tzu-Tao Chang and Shivaram Venkataraman. Eva: Cost-efficient cloud- based cluster scheduling. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 1399–1416, New York, NY, USA, 2025. Association for Computing Machinery

work page 2025
[8]

Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning

Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, and Srinidhi Viswanatha. Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning. InProceedings of the Fifteenth European Conference on Computer Systems, EuroSys ’20, New York, NY, USA, 2020. Association for Computing Machinery

work page 2020
[9]

Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

clara-parabricks-workflows contributors. Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

work page
[10]

README notes cloud instance prices as of July 2024

work page 2024
[11]

Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot

Company. Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot. Accessed: 2025-02-14

work page 2025
[12]

Parcae: proactive, liveput-optimized dnn training on preemptible instances

Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia. Parcae: proactive, liveput-optimized dnn training on preemptible instances. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI’24, USA, 2024. USENIX Association

work page 2024
[13]

Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016

Financial Industry Regulatory Authority (FINRA).Plan to Address Extraordinary Market Volatility, April 2016. Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016. PDF

work page 2016
[14]

ServerlessLLM: Low-Latency serverless inference for large language models

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. ServerlessLLM: Low-Latency serverless inference for large language models. In18th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 24), pages 135–153, Santa Clara, CA, July 2024. USENIX Association

work page 2024
[15]

Dominant resource fairness: fair allocation of multiple resource types

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: fair allocation of multiple resource types. InProceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11, page 323–336, USA, 2011. USENIX Association

work page 2011
[16]

Altruistic scheduling in Multi-Resource clusters

Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in Multi-Resource clusters. In12th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 16), pages 65–80, Savannah, GA, November 2016. USENIX Association

work page 2016
[17]

Hacc: extreme scaling and performance across diverse architectures

Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. Hacc: extreme scaling and performance across diverse architectures. InProceedings of the International Confer- ence on High Performance Computing, Networking, Storage and Analysis, SC ’13, New York, NY, USA, 2013. Association for Computing Machin- ery

work page 2013
[18]

Ganger, and Phillip B

Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, and Phillip B. Gibbons. Proteus: agile ml elasticity through tiered reliability in dynamic resource markets. InProceedings of the Twelfth European Conference on Computer Systems, EuroSys ’17, page 589–604. ACM, April 2017

work page 2017
[19]

Joseph, Randy Katz, Scott Shenker, and Ion Stoica

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, An- thony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems De- sign and Implementation, NSDI’11, page 295–308, USA, 2011. USENIX Association

work page 2011
[20]

Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J

Syed M. Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J. Hu. Cospot: a cooperative vm allocation framework for increased revenue from spot instances. InProceedings of the 13th Symposium on Cloud Computing, SoCC ’22, page 540–556, New York, NY, USA, 2022. Association for Computing Machinery

work page 2022
[21]

The price is (not) right: Reflec- tions on pricing for transient cloud servers

David Irwin, Prashant Shenoy, Pradeep Ambati, Prateek Sharma, Supreeth Shastri, and Ahmed Ali-Eldin. The price is (not) right: Reflec- tions on pricing for transient cloud servers. In2019 28th International Conference on Computer Communication and Networks (ICCCN), pages 1–9, 2019

work page 2019
[22]

Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. Sia: Heterogeneity-aware, goodput-optimized ml-cluster scheduling. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 642–657, New York, NY, USA, 2023. Association for Computing Machinery

work page 2023
[23]

A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro

Beomyeol Jeon, Chen Wang, Diana Arroyo, Alaa Youssef, and Indranil Gupta. A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 524–540. ACM, March 2025

work page 2025
[24]

Lambda: The deep learning company.https://www

Lambda Labs. Lambda: The deep learning company.https://www. lambdalabs.com

work page
[25]

Flux: Unifying heterogeneous infrastructure for alibaba analyticdb

Wei Li, Jiachi Zhang, Ye Yin, Yan Li, Zhanyang Zhu, Yuhao Li, Zhen- can Peng, Lan Lu, Wenchao Zhou, Liang Lin, and Feifei Li. Flux: Unifying heterogeneous infrastructure for alibaba analyticdb. InCom- panion of the 2025 International Conference on Management of Data, SIGMOD/PODS ’25, page 539–552. ACM, June 2025

work page 2025
[26]

Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism

Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, and Minjia Zhang. Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism. InProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’25, USA, ...

work page 2025
[27]

Themis: Fair and efficient GPU cluster scheduling

Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Associa- tion

work page 2020
[28]

Microsoft, January 2026

Microsoft.Microsoft Volume Licensing Service Level Agreement for Mi- crosoft Online Services (Worldwide English, January 1, 2026). Microsoft, January 2026. PDF; filename indicates document ID SLA5280

work page 2026
[29]

Heet: Accelerating elastic training in heterogeneous deep learning clusters

Zizhao Mo, Huanle Xu, and Chengzhong Xu. Heet: Accelerating elastic training in heterogeneous deep learning clusters. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’24, page 499–513, New York, NY, USA, 2024. Association for Computing Machinery. 13

work page 2024
[30]

Ras: Continuously optimized region-wide dat- acenter resource allocation

Andrew Newell, Dimitrios Skarlatos, Jingyuan Fan, Pavan Kumar, Maxim Khutornenko, Mayank Pundir, Yirui Zhang, Mingjun Zhang, Yuanlai Liu, Linh Le, Brendon Daugherty, Apurva Samudra, Prashasti Baid, James Kneeland, Igor Kabiljo, Dmitry Shchukin, Andre Ro- drigues, Scott Michelson, Ben Christensen, Kaushik Veeraraghavan, and Chunqiang Tang. Ras: Continuousl...

work page 2021
[31]

Nvidia dynamo documentation.https://docs

NVIDIA Corporation. Nvidia dynamo documentation.https://docs. nvidia.com/dynamo/index.html, 2025. Accessed: 2025-12-11

work page 2025
[32]

NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

NVIDIA Corporation. NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

work page 2025
[33]

Spar- row: distributed, low latency scheduling

Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Spar- row: distributed, low latency scheduling. InProceedings of the Twenty- Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, page 69–84. ACM, November 2013

work page 2013
[34]

Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving

Haoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, and Rodrigo Fonseca. Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving. InProceedings of the 2025 ACM Symposium on Cloud Computing (SoCC 2025), New York...

work page 2025
[35]

Stratus: Clouds with microar- chitectural resource management

Kaveh Razavi and Animesh Trivedi. Stratus: Clouds with microar- chitectural resource management. In12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20). USENIX Association, July 2020

work page 2020
[36]

Aws pricing calculator.https://calculator.aws/ #/, 2025

Amazon Web Services. Aws pricing calculator.https://calculator.aws/ #/, 2025. Web-based cost estimation tool for Amazon Web Services

work page 2025
[37]

Parkes, Alvin AuYoung, Alex C

Jeffrey Shneidman, Chaki Ng, David C. Parkes, Alvin AuYoung, Alex C. Snoeren, Amin Vahdat, and Brent Chun. Why markets could (but don’t currently) solve resource allocation problems in systems. In Proceedings of the 10th Conference on Hot Topics in Operating Systems - Volume 10, HOTOS’05, page 7, USA, 2005. USENIX Association

work page 2005
[38]

Ecovisor: A virtual energy sys- tem for carbon-efficient applications

Abel Souza, Noman Bashir, Jorge Murillo, Walid Hanafy, Qianlin Liang, David Irwin, and Prashant Shenoy. Ecovisor: A virtual energy sys- tem for carbon-efficient applications. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 2, ASPLOS 2023, page 252–265, New York, NY, ...

work page 2023
[39]

Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’2...

work page
[40]

Association for Computing Machinery

work page
[41]

Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

work page 2024
[42]

Orion: Interference- aware, fine-grained gpu sharing for ml applications

Foteini Strati, Xianzhe Ma, and Ana Klimovic. Orion: Interference- aware, fine-grained gpu sharing for ml applications. InProceedings of the Nineteenth European Conference on Computer Systems, EuroSys ’24, page 1075–1092, New York, NY, USA, 2024. Association for Computing Machinery

work page 2024
[43]

Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters

Foteini Strati, Zhendong Zhang, George Manos, Ixeia Sánchez Périz, Qinghao Hu, Tiancheng Chen, Berk Buzcu, Song Han, Pamela Del- gado, and Ana Klimovic. Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, SOSP ’25, page 204–220. A...

work page 2025
[44]

Crusoe cloud.https://crusoe.ai

Crusoe Energy Systems. Crusoe cloud.https://crusoe.ai

work page
[45]

Twine: A unified cluster management system for shared infrastructure

Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. Twine: A unified clust...

work page 2020
[46]

Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes

Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes. Large-scale cluster management at google with borg. InProceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015

work page 2015
[47]

Karma: Resource allocation for dynamic demands

Midhul Vuppalapati, Giannis Fikioris, Rachit Agarwal, Asaf Cidon, Anurag Khandelwal, and Éva Tardos. Karma: Resource allocation for dynamic demands. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 645–662, Boston, MA, July

work page
[48]

Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections

Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, and Peter Pietzuch. Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections. InProceedings of the ACM SIGOPS 30th Sympo- sium on Operating Systems Principles, SOSP ’24, page 195–210. ACM, November 2024

work page 2024
[49]

Martínez

Xiaodong Wang and José F. Martínez. Xchange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 113–125, 2015

work page 2015
[50]

Can’t be late: Optimizing spot instance savings under deadlines

Zhanghao Wu, Wei-Lin Chiang, Ziming Mao, Zongheng Yang, Eric Friedman, Scott Shenker, and Ion Stoica. Can’t be late: Optimizing spot instance savings under deadlines. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 185–203, Santa Clara, CA, April 2024. USENIX Association

work page 2024
[51]

Gödel: Unified large-scale resource management and scheduling at bytedance

Wu Xiang, Yakun Li, Yuquan Ren, Fan Jiang, Chaohui Xin, Varun Gupta, Chao Xiang, Xinyi Song, Meng Liu, Bing Li, Kaiyang Shao, Chen Xu, Wei Shao, Yuqi Fu, Wilson Wang, Cong Xu, Wei Xu, Caixue Lin, Rui Shi, and Yuming Liang. Gödel: Unified large-scale resource management and scheduling at bytedance. InProceedings of the 2023 ACM Symposium on Cloud Computing...

work page 2023
[52]

Gandiva: Introspective cluster scheduling for deep learning

Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

work page 2018
[53]

Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, and Benjamin C. Lee. Carbon responder: Coordinating demand response for the datacenter fleet, 2023

work page 2023
[54]

SkyPilot: An intercloud broker for sky computing

Zongheng Yang, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, Frank Sifei Luan, Gautam Mittal, Scott Shenker, and Ion Stoica. SkyPilot: An intercloud broker for sky computing. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 437–455, Boston, MA, April 2023. USENIX Association

work page 2023
[55]

Zeus: Under- standing and optimizing GPU energy consumption of DNN training

Jie You, Jae-Won Chung, and Mosharaf Chowdhury. Zeus: Under- standing and optimizing GPU energy consumption of DNN training. In20th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 23), pages 119–139, Boston, MA, April 2023. USENIX Association

work page 2023
[56]

SHEP- HERD: Serving DNNs in the wild

Hong Zhang, Yupeng Tang, Anurag Khandelwal, and Ion Stoica. SHEP- HERD: Serving DNNs in the wild. In20th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 23), pages 787–808, Boston, MA, April 2023. USENIX Association

work page 2023
[57]

Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In20th USENIX Symposium 14 on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association

work page 2023
[58]

Zhiheng Zhong and Rajkumar Buyya. A cost-efficient container or- chestration strategy in kubernetes-based cloud computing infrastruc- tures with heterogeneous resources.ACM Transactions on Internet Technology, 20(2):1–24, April 2020

work page 2020
[59]

Rao, Aadharsh Kannan, and R

Çınar Kilcioglu, Justin M. Rao, Aadharsh Kannan, and R. Preston McAfee. Usage patterns and the economics of the public cloud. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, 2017. Includes empirical analysis of utilization in public cloud systems. 15

work page 2017

[1] [1]

AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

ai-dynamo contributors. AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

work page 2025

[2] [2]

Praneet Arshi and Joel Miller. Our approach to carbon- aware data centers: Central data center fleet manage- ment.https://cloud.google.com/blog/topics/sustainability/ googles-approach-to-carbon-aware-data-center, September

work page

[3] [3]

Google Cloud Blog post

work page

[4] [4]

scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference

AWS Samples. scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference. GitHub repository for hardware-agnostic inference on mixed accelerators

work page

[5] [5]

Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage

Hamid Hajabdolali Bazzaz, Yingjie Bi, Weiwu Pang, Minlan Yu, Ramesh Govindan, Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, Yuxue Jin, Charles Carver, Jan Kopański, Liqun Cheng, and Amin Vahdat. Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage. In22nd USENIX Sympos...

work page

[6] [6]

Cilantro: Performance-Aware resource allocation for general objectives via online feedback

Romil Bhardwaj, Kirthevasan Kandasamy, Asim Biswal, Wenshuo Guo, Benjamin Hindman, Joseph Gonzalez, Michael Jordan, and Ion Stoica. Cilantro: Performance-Aware resource allocation for general objectives via online feedback. In17th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 23), pages 623–643, Boston, MA, July 2023. USENIX Association

work page 2023

[7] [7]

Eva: Cost-efficient cloud- based cluster scheduling

Tzu-Tao Chang and Shivaram Venkataraman. Eva: Cost-efficient cloud- based cluster scheduling. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 1399–1416, New York, NY, USA, 2025. Association for Computing Machinery

work page 2025

[8] [8]

Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning

Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, and Srinidhi Viswanatha. Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning. InProceedings of the Fifteenth European Conference on Computer Systems, EuroSys ’20, New York, NY, USA, 2020. Association for Computing Machinery

work page 2020

[9] [9]

Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

clara-parabricks-workflows contributors. Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

work page

[10] [10]

README notes cloud instance prices as of July 2024

work page 2024

[11] [11]

Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot

Company. Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot. Accessed: 2025-02-14

work page 2025

[12] [12]

Parcae: proactive, liveput-optimized dnn training on preemptible instances

Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia. Parcae: proactive, liveput-optimized dnn training on preemptible instances. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI’24, USA, 2024. USENIX Association

work page 2024

[13] [13]

Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016

Financial Industry Regulatory Authority (FINRA).Plan to Address Extraordinary Market Volatility, April 2016. Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016. PDF

work page 2016

[14] [14]

ServerlessLLM: Low-Latency serverless inference for large language models

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. ServerlessLLM: Low-Latency serverless inference for large language models. In18th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 24), pages 135–153, Santa Clara, CA, July 2024. USENIX Association

work page 2024

[15] [15]

Dominant resource fairness: fair allocation of multiple resource types

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: fair allocation of multiple resource types. InProceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11, page 323–336, USA, 2011. USENIX Association

work page 2011

[16] [16]

Altruistic scheduling in Multi-Resource clusters

Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in Multi-Resource clusters. In12th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 16), pages 65–80, Savannah, GA, November 2016. USENIX Association

work page 2016

[17] [17]

Hacc: extreme scaling and performance across diverse architectures

Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. Hacc: extreme scaling and performance across diverse architectures. InProceedings of the International Confer- ence on High Performance Computing, Networking, Storage and Analysis, SC ’13, New York, NY, USA, 2013. Association for Computing Machin- ery

work page 2013

[18] [18]

Ganger, and Phillip B

Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, and Phillip B. Gibbons. Proteus: agile ml elasticity through tiered reliability in dynamic resource markets. InProceedings of the Twelfth European Conference on Computer Systems, EuroSys ’17, page 589–604. ACM, April 2017

work page 2017

[19] [19]

Joseph, Randy Katz, Scott Shenker, and Ion Stoica

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, An- thony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems De- sign and Implementation, NSDI’11, page 295–308, USA, 2011. USENIX Association

work page 2011

[20] [20]

Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J

Syed M. Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J. Hu. Cospot: a cooperative vm allocation framework for increased revenue from spot instances. InProceedings of the 13th Symposium on Cloud Computing, SoCC ’22, page 540–556, New York, NY, USA, 2022. Association for Computing Machinery

work page 2022

[21] [21]

The price is (not) right: Reflec- tions on pricing for transient cloud servers

David Irwin, Prashant Shenoy, Pradeep Ambati, Prateek Sharma, Supreeth Shastri, and Ahmed Ali-Eldin. The price is (not) right: Reflec- tions on pricing for transient cloud servers. In2019 28th International Conference on Computer Communication and Networks (ICCCN), pages 1–9, 2019

work page 2019

[22] [22]

Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. Sia: Heterogeneity-aware, goodput-optimized ml-cluster scheduling. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 642–657, New York, NY, USA, 2023. Association for Computing Machinery

work page 2023

[23] [23]

A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro

Beomyeol Jeon, Chen Wang, Diana Arroyo, Alaa Youssef, and Indranil Gupta. A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 524–540. ACM, March 2025

work page 2025

[24] [24]

Lambda: The deep learning company.https://www

Lambda Labs. Lambda: The deep learning company.https://www. lambdalabs.com

work page

[25] [25]

Flux: Unifying heterogeneous infrastructure for alibaba analyticdb

Wei Li, Jiachi Zhang, Ye Yin, Yan Li, Zhanyang Zhu, Yuhao Li, Zhen- can Peng, Lan Lu, Wenchao Zhou, Liang Lin, and Feifei Li. Flux: Unifying heterogeneous infrastructure for alibaba analyticdb. InCom- panion of the 2025 International Conference on Management of Data, SIGMOD/PODS ’25, page 539–552. ACM, June 2025

work page 2025

[26] [26]

Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism

Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, and Minjia Zhang. Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism. InProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’25, USA, ...

work page 2025

[27] [27]

Themis: Fair and efficient GPU cluster scheduling

Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Associa- tion

work page 2020

[28] [28]

Microsoft, January 2026

Microsoft.Microsoft Volume Licensing Service Level Agreement for Mi- crosoft Online Services (Worldwide English, January 1, 2026). Microsoft, January 2026. PDF; filename indicates document ID SLA5280

work page 2026

[29] [29]

Heet: Accelerating elastic training in heterogeneous deep learning clusters

Zizhao Mo, Huanle Xu, and Chengzhong Xu. Heet: Accelerating elastic training in heterogeneous deep learning clusters. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’24, page 499–513, New York, NY, USA, 2024. Association for Computing Machinery. 13

work page 2024

[30] [30]

Ras: Continuously optimized region-wide dat- acenter resource allocation

Andrew Newell, Dimitrios Skarlatos, Jingyuan Fan, Pavan Kumar, Maxim Khutornenko, Mayank Pundir, Yirui Zhang, Mingjun Zhang, Yuanlai Liu, Linh Le, Brendon Daugherty, Apurva Samudra, Prashasti Baid, James Kneeland, Igor Kabiljo, Dmitry Shchukin, Andre Ro- drigues, Scott Michelson, Ben Christensen, Kaushik Veeraraghavan, and Chunqiang Tang. Ras: Continuousl...

work page 2021

[31] [31]

Nvidia dynamo documentation.https://docs

NVIDIA Corporation. Nvidia dynamo documentation.https://docs. nvidia.com/dynamo/index.html, 2025. Accessed: 2025-12-11

work page 2025

[32] [32]

NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

NVIDIA Corporation. NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

work page 2025

[33] [33]

Spar- row: distributed, low latency scheduling

Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Spar- row: distributed, low latency scheduling. InProceedings of the Twenty- Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, page 69–84. ACM, November 2013

work page 2013

[34] [34]

Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving

Haoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, and Rodrigo Fonseca. Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving. InProceedings of the 2025 ACM Symposium on Cloud Computing (SoCC 2025), New York...

work page 2025

[35] [35]

Stratus: Clouds with microar- chitectural resource management

Kaveh Razavi and Animesh Trivedi. Stratus: Clouds with microar- chitectural resource management. In12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20). USENIX Association, July 2020

work page 2020

[36] [36]

Aws pricing calculator.https://calculator.aws/ #/, 2025

Amazon Web Services. Aws pricing calculator.https://calculator.aws/ #/, 2025. Web-based cost estimation tool for Amazon Web Services

work page 2025

[37] [37]

Parkes, Alvin AuYoung, Alex C

Jeffrey Shneidman, Chaki Ng, David C. Parkes, Alvin AuYoung, Alex C. Snoeren, Amin Vahdat, and Brent Chun. Why markets could (but don’t currently) solve resource allocation problems in systems. In Proceedings of the 10th Conference on Hot Topics in Operating Systems - Volume 10, HOTOS’05, page 7, USA, 2005. USENIX Association

work page 2005

[38] [38]

Ecovisor: A virtual energy sys- tem for carbon-efficient applications

Abel Souza, Noman Bashir, Jorge Murillo, Walid Hanafy, Qianlin Liang, David Irwin, and Prashant Shenoy. Ecovisor: A virtual energy sys- tem for carbon-efficient applications. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 2, ASPLOS 2023, page 252–265, New York, NY, ...

work page 2023

[39] [39]

Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’2...

work page

[40] [40]

Association for Computing Machinery

work page

[41] [41]

Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

work page 2024

[42] [42]

Orion: Interference- aware, fine-grained gpu sharing for ml applications

Foteini Strati, Xianzhe Ma, and Ana Klimovic. Orion: Interference- aware, fine-grained gpu sharing for ml applications. InProceedings of the Nineteenth European Conference on Computer Systems, EuroSys ’24, page 1075–1092, New York, NY, USA, 2024. Association for Computing Machinery

work page 2024

[43] [43]

Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters

Foteini Strati, Zhendong Zhang, George Manos, Ixeia Sánchez Périz, Qinghao Hu, Tiancheng Chen, Berk Buzcu, Song Han, Pamela Del- gado, and Ana Klimovic. Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, SOSP ’25, page 204–220. A...

work page 2025

[44] [44]

Crusoe cloud.https://crusoe.ai

Crusoe Energy Systems. Crusoe cloud.https://crusoe.ai

work page

[45] [45]

Twine: A unified cluster management system for shared infrastructure

Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. Twine: A unified clust...

work page 2020

[46] [46]

Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes

Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes. Large-scale cluster management at google with borg. InProceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015

work page 2015

[47] [47]

Karma: Resource allocation for dynamic demands

Midhul Vuppalapati, Giannis Fikioris, Rachit Agarwal, Asaf Cidon, Anurag Khandelwal, and Éva Tardos. Karma: Resource allocation for dynamic demands. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 645–662, Boston, MA, July

work page

[48] [48]

Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections

Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, and Peter Pietzuch. Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections. InProceedings of the ACM SIGOPS 30th Sympo- sium on Operating Systems Principles, SOSP ’24, page 195–210. ACM, November 2024

work page 2024

[49] [49]

Martínez

Xiaodong Wang and José F. Martínez. Xchange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 113–125, 2015

work page 2015

[50] [50]

Can’t be late: Optimizing spot instance savings under deadlines

Zhanghao Wu, Wei-Lin Chiang, Ziming Mao, Zongheng Yang, Eric Friedman, Scott Shenker, and Ion Stoica. Can’t be late: Optimizing spot instance savings under deadlines. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 185–203, Santa Clara, CA, April 2024. USENIX Association

work page 2024

[51] [51]

Gödel: Unified large-scale resource management and scheduling at bytedance

Wu Xiang, Yakun Li, Yuquan Ren, Fan Jiang, Chaohui Xin, Varun Gupta, Chao Xiang, Xinyi Song, Meng Liu, Bing Li, Kaiyang Shao, Chen Xu, Wei Shao, Yuqi Fu, Wilson Wang, Cong Xu, Wei Xu, Caixue Lin, Rui Shi, and Yuming Liang. Gödel: Unified large-scale resource management and scheduling at bytedance. InProceedings of the 2023 ACM Symposium on Cloud Computing...

work page 2023

[52] [52]

Gandiva: Introspective cluster scheduling for deep learning

Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

work page 2018

[53] [53]

Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, and Benjamin C. Lee. Carbon responder: Coordinating demand response for the datacenter fleet, 2023

work page 2023

[54] [54]

SkyPilot: An intercloud broker for sky computing

Zongheng Yang, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, Frank Sifei Luan, Gautam Mittal, Scott Shenker, and Ion Stoica. SkyPilot: An intercloud broker for sky computing. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 437–455, Boston, MA, April 2023. USENIX Association

work page 2023

[55] [55]

Zeus: Under- standing and optimizing GPU energy consumption of DNN training

Jie You, Jae-Won Chung, and Mosharaf Chowdhury. Zeus: Under- standing and optimizing GPU energy consumption of DNN training. In20th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 23), pages 119–139, Boston, MA, April 2023. USENIX Association

work page 2023

[56] [56]

SHEP- HERD: Serving DNNs in the wild

Hong Zhang, Yupeng Tang, Anurag Khandelwal, and Ion Stoica. SHEP- HERD: Serving DNNs in the wild. In20th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 23), pages 787–808, Boston, MA, April 2023. USENIX Association

work page 2023

[57] [57]

Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In20th USENIX Symposium 14 on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association

work page 2023

[58] [58]

Zhiheng Zhong and Rajkumar Buyya. A cost-efficient container or- chestration strategy in kubernetes-based cloud computing infrastruc- tures with heterogeneous resources.ACM Transactions on Internet Technology, 20(2):1–24, April 2020

work page 2020

[59] [59]

Rao, Aadharsh Kannan, and R

Çınar Kilcioglu, Justin M. Rao, Aadharsh Kannan, and R. Preston McAfee. Usage patterns and the economics of the public cloud. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, 2017. Includes empirical analysis of utilization in public cloud systems. 15

work page 2017