pith. sign in

arxiv: 2604.22509 · v1 · submitted 2026-04-24 · 💻 cs.DC

LaissezCloud: Continuous Resource Renegotiation for the Public Cloud

Pith reviewed 2026-05-08 09:44 UTC · model grok-4.3

classification 💻 cs.DC
keywords cloud resource managementcontinuous biddingdynamic allocationspot instancesincentive alignmentheterogeneous hardwareoversubscribed clusters
0
0 comments X

The pith

LaissezCloud keeps cloud resource allocations continuously contestable through online bids so tenants retain them only while outbidding others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LaissezCloud as a management platform that replaces rigid on-demand and spot categories with ongoing renegotiation of running allocations. Tenants and operators update bids in real time, and a tenant holds a resource only as long as its bid exceeds competing demand. Pricing functions as the single interface that lets tenants signal workload value while operators encode constraints such as power or carbon limits. This setup matters for oversubscribed clusters with heterogeneous accelerators because current models force either fixed commitments or sudden preemption, leading to unnecessary performance loss when demands shift. If the approach succeeds, clouds could handle time-varying objectives and operator goals without any party revealing internal application or infrastructure details.

Core claim

LaissezCloud enables continuous re-negotiation of running allocations by having tenants and operators update bids online during execution. A tenant retains a resource only as long as its bid exceeds competing demand. The pricing mechanism serves as a narrow waist that aligns incentives between untrusted parties: tenants signal utility via bids, operators encode constraints like power or carbon without exposing telemetry. Across accelerator workloads the approach reduces performance degradation under contention by 8-23 percent versus on-demand and spot baselines and scales to clusters of at least 10,000 nodes.

What carries the argument

Continuous online bidding with pricing as the narrow waist for incentive alignment between tenants and operators.

Load-bearing premise

Continuous online bid updates can be performed efficiently and the pricing mechanism aligns incentives between untrusted tenants and operators without exposing internal states.

What would settle it

An experiment on a contended multi-tenant cluster showing either that bid-update overhead exceeds 5 percent of runtime or that performance degradation does not drop below the on-demand and spot baselines.

Figures

Figures reproduced from arXiv: 2604.22509 by Antoine Kaufmann, Tejas Harith.

Figure 1
Figure 1. Figure 1: FCFS: App A acquires HW1 first, forcing App B, dis￾patched at 0.1, to take its second-ranked hardware. FCFS-P: App A acquires HW1 first, but App B, dispatched at 0.1, pre￾empts App A, interrupting an epoch between checkpoints. 2.2 An Illustrative Cloud Example To make this concrete, consider two applications running on a small cloud with three machines of different hardware types. App A is a checkpointed t… view at source ↗
Figure 2
Figure 2. Figure 2: App A acquires HW1 first and App B acquires HW2 on dispatch, but B raises its willingness to pay for HW1 over time. The higher price eventually induces A to migrate after reaching a checkpoint, avoiding wasted work before moving the resource to the tenant that values it more. during execution. On-demand instances freeze that decision too early, while spot-style preemption makes allocations re￾claimable but… view at source ↗
Figure 3
Figure 3. Figure 3: LaissezCloud overview (components highlighted). 4.1 Overview LaissezCloud ( view at source ↗
Figure 5
Figure 5. Figure 5: Each resource instance’s order book is a leaf in one of the type-specific trees for the hierarchical topology. Order books on inner nodes aggregate orders in the books below. LaissezCloud exposes prices through the hierarchy at the level relevant to a tenant’s current decisions. The instance API in view at source ↗
Figure 6
Figure 6. Figure 6: Performance retained for clusters of various com￾position. We use these normalized metrics only to compare cloud in￾terfaces across heterogeneous workloads, not to claim that the underlying application objectives are identical. 5.2 Allocation Efficiency Under Contention LaissezCloud improves allocation quality across all contention regimes in view at source ↗
Figure 7
Figure 7. Figure 7: LaissezCloud lets tenants react to live prices and shift to cheaper allocations when urgency is low. 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 Norm Perf Retention Tenant Cost ($) Spot On Demand ours 1314 15 16 17 20 2225 view at source ↗
Figure 8
Figure 8. Figure 8: LaissezCloud lets tenants navigate a broad cost￾performance frontier between spot-like and on-demand-like behavior. Budgets provided to the app are provided next to LaissezCloud points. 5.3 Tenant Outcomes Beyond improving average allocation quality, LaissezCloud improves tenant outcomes in two ways. It lets tenants adjust cost-performance trade-offs online as prices and urgency change, and it lets topolog… view at source ↗
Figure 11
Figure 11. Figure 11: LaissezCloud lets an InfraMaps policy steer load away from a power-constrained row using prices alone. 5.4 Operator-Side Control InfraMaps give operators a soft control lever for steering demand without exposing raw infrastructure telemetry or selecting victims directly. This matters because operator-side constraints such as power, cooling, maintenance, or conges￾tion often change faster than instance lif… view at source ↗
Figure 14
Figure 14. Figure 14: Excess volatility induces churn, while overly constrained prices approach FCFS-like inefficiency; a middle ground performs best. this trade-off. LaissezCloud can regulate upward volatility by clipping incoming bids relative to the current price and regulate downward volatility by bounding how quickly the operator’s floor price falls. These controls also prevent a common failure mode around newly freed nod… view at source ↗
Figure 13
Figure 13. Figure 13: Lower reconfiguration overhead enables more beneficial exchanges, while high overhead pushes Laissez￾Cloud back toward FCFS-like behavior. Reconfiguration Overhead Reconfiguration cost is the main counterforce to continuous renegotiation, so we vary it by applying a uniform multiplier to all tenant overheads. Our baseline overheads come from the representative systems in view at source ↗
Figure 15
Figure 15. Figure 15: Underestimating reconfiguration overhead hurts more than overestimating it, although LaissezCloud tolerates small errors of about ±5%. 1 type NodeSpec struct { // Describes the desired node to add or remove. 2 NodeType string 3 Locality string 4 RelTo string 5 Meta map[string]any 6 } 7 8 // Pricing logic called by EconAdapter on every add, remove and market-update 9 func Price (n EA.NodeSpec, b EA.OrderBo… view at source ↗
read the original abstract

Public clouds increasingly expose heterogeneous hardware, but their allocation interface remains built around rigid on-demand and spot service classes. This makes it hard to satisfy time-varying tenant objectives and operator constraints in oversubscribed, heterogeneous clusters without exposing internal application or infrastructure state. We present LaissezCloud, a cloud resource management platform for continuous re-negotiation of running allocations. Unlike spot instances, which use launch-time bids and unilateral preemption, LaissezCloud keeps allocations continuously contestable during execution: tenants and operators update bids online, and a running tenant keeps a resource only as long as its bid exceeds competing demand. Pricing serves both as a narrow waist and as an incentive-alignment mechanism between mutually untrusted participants: tenants express utility through bids, while operators price in power, cooling, or carbon constraints without exposing internal telemetry. Across a diverse set of accelerator workloads, LaissezCloud reduces performance degradation under contention by 8-23% versus on-demand and spot baselines, and scales to clusters of at least 10,000 nodes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces LaissezCloud, a cloud resource management platform enabling continuous renegotiation of running allocations via online bid updates from tenants and operators. Pricing functions as a narrow waist for allocation decisions and incentive alignment between untrusted parties without exposing internal application or infrastructure state. The central empirical claims are an 8-23% reduction in performance degradation under contention versus on-demand and spot baselines across diverse accelerator workloads, together with demonstrated scaling to clusters of at least 10,000 nodes.

Significance. If the reported gains prove robust once bid-update overhead is quantified and the evaluation methodology is fully documented, the work would offer a substantive contribution to distributed systems and cloud computing by replacing rigid service classes with a continuously contestable, incentive-compatible allocation model. The empirical evaluation across workloads and the scaling result to 10k nodes are strengths that, if substantiated, would support broader adoption of pricing-mediated renegotiation.

major comments (2)
  1. [Abstract] Abstract: the performance claims (8-23% reduction in degradation and scaling to 10,000 nodes) are presented without any description of the evaluation methodology, workload characteristics, contention levels, measurement of bid-update frequency/latency, or per-node overhead. This absence directly undermines assessment of whether the data support the headline results, especially since continuous bidding traffic could offset the reported gains.
  2. [Abstract] Abstract: the central assumption that repeated online bid updates incur negligible cost relative to the workloads is load-bearing for both the performance improvement and the 10k-node scaling claim, yet the manuscript provides no measurements of message size, auction latency, consensus cost, or aggregate communication overhead in the contended regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract requires more context on methodology and overhead to substantiate the claims, and we have revised it accordingly while adding explicit overhead measurements to the evaluation section.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the performance claims (8-23% reduction in degradation and scaling to 10,000 nodes) are presented without any description of the evaluation methodology, workload characteristics, contention levels, measurement of bid-update frequency/latency, or per-node overhead. This absence directly undermines assessment of whether the data support the headline results, especially since continuous bidding traffic could offset the reported gains.

    Authors: We agree that the abstract is too terse and should briefly outline the evaluation to allow assessment of the claims. In the revised manuscript we have expanded the abstract to note the diverse accelerator workloads, contention scenarios, and that bid-update overhead was measured and remains low relative to gains (with pointers to Sections 4-6 for full methodology). revision: yes

  2. Referee: [Abstract] Abstract: the central assumption that repeated online bid updates incur negligible cost relative to the workloads is load-bearing for both the performance improvement and the 10k-node scaling claim, yet the manuscript provides no measurements of message size, auction latency, consensus cost, or aggregate communication overhead in the contended regime.

    Authors: We accept that dedicated measurements of bid-update overhead were insufficiently documented. While the scaling experiments implicitly incorporate communication costs, we have added an explicit subsection (5.3) with measurements of message sizes, auction latency, consensus costs, and aggregate overhead under contention. These confirm the overhead is small enough not to offset the reported gains or the 10k-node scaling. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system claims rest on evaluation, not self-referential derivations

full rationale

The paper describes a systems platform for continuous bid-based renegotiation and reports measured improvements (8-23% lower degradation, scaling to 10k nodes) from workload experiments. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Performance numbers are presented as direct experimental outcomes rather than outputs derived from the inputs by construction. The design uses pricing as an incentive mechanism, but this is an architectural choice justified by stated goals, not a tautological reduction. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper's claims rely on domain assumptions about the feasibility of continuous bidding and the effectiveness of pricing for coordination in untrusted settings, without providing independent evidence for these in the abstract.

axioms (2)
  • domain assumption Participants are mutually untrusted and will not expose internal state.
    Stated in abstract as key to the design.
  • domain assumption Bids can be updated online without disrupting running allocations.
    Core to continuous renegotiation.
invented entities (1)
  • LaissezCloud platform no independent evidence
    purpose: To enable continuous resource renegotiation
    The proposed system itself.

pith-pipeline@v0.9.0 · 5473 in / 1410 out tokens · 72253 ms · 2026-05-08T09:44:06.273365+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

  1. [1]

    AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

    ai-dynamo contributors. AIConfigurator: Offline optimization of your disaggregated dynamo graph, 2025

  2. [2]

    Praneet Arshi and Joel Miller. Our approach to carbon- aware data centers: Central data center fleet manage- ment.https://cloud.google.com/blog/topics/sustainability/ googles-approach-to-carbon-aware-data-center, September

  3. [3]

    Google Cloud Blog post

  4. [4]

    scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference

    AWS Samples. scalable-hw-agnostic-inference.https://github.com/ aws-samples/scalable-hw-agnostic-inference. GitHub repository for hardware-agnostic inference on mixed accelerators

  5. [5]

    Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage

    Hamid Hajabdolali Bazzaz, Yingjie Bi, Weiwu Pang, Minlan Yu, Ramesh Govindan, Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, Yuxue Jin, Charles Carver, Jan Kopański, Liqun Cheng, and Amin Vahdat. Preventing network bottlenecks: Accelerat- ing datacenter services with Hotspot-Aware placement for compute and storage. In22nd USENIX Sympos...

  6. [6]

    Cilantro: Performance-Aware resource allocation for general objectives via online feedback

    Romil Bhardwaj, Kirthevasan Kandasamy, Asim Biswal, Wenshuo Guo, Benjamin Hindman, Joseph Gonzalez, Michael Jordan, and Ion Stoica. Cilantro: Performance-Aware resource allocation for general objectives via online feedback. In17th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 23), pages 623–643, Boston, MA, July 2023. USENIX Association

  7. [7]

    Eva: Cost-efficient cloud- based cluster scheduling

    Tzu-Tao Chang and Shivaram Venkataraman. Eva: Cost-efficient cloud- based cluster scheduling. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 1399–1416, New York, NY, USA, 2025. Association for Computing Machinery

  8. [8]

    Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning

    Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, and Srinidhi Viswanatha. Balancing efficiency and fair- ness in heterogeneous gpu clusters for deep learning. InProceedings of the Fifteenth European Conference on Computer Systems, EuroSys ’20, New York, NY, USA, 2020. Association for Computing Machinery

  9. [9]

    Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

    clara-parabricks-workflows contributors. Parabricks Benchmarks: Benchmarking guide and scripts for NVIDIA Parabricks workflows,

  10. [10]

    README notes cloud instance prices as of July 2024

  11. [11]

    Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot

    Company. Sf compute documentation.https://docs.sfcompute.com/ docs/on-demand-and-spot. Accessed: 2025-02-14

  12. [12]

    Parcae: proactive, liveput-optimized dnn training on preemptible instances

    Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia. Parcae: proactive, liveput-optimized dnn training on preemptible instances. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI’24, USA, 2024. USENIX Association

  13. [13]

    Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016

    Financial Industry Regulatory Authority (FINRA).Plan to Address Extraordinary Market Volatility, April 2016. Pursuant to Rule 608 of Regulation NMS under the Securities Exchange Act of 1934; effective April 21, 2016. PDF

  14. [14]

    ServerlessLLM: Low-Latency serverless inference for large language models

    Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. ServerlessLLM: Low-Latency serverless inference for large language models. In18th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 24), pages 135–153, Santa Clara, CA, July 2024. USENIX Association

  15. [15]

    Dominant resource fairness: fair allocation of multiple resource types

    Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: fair allocation of multiple resource types. InProceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11, page 323–336, USA, 2011. USENIX Association

  16. [16]

    Altruistic scheduling in Multi-Resource clusters

    Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in Multi-Resource clusters. In12th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 16), pages 65–80, Savannah, GA, November 2016. USENIX Association

  17. [17]

    Hacc: extreme scaling and performance across diverse architectures

    Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. Hacc: extreme scaling and performance across diverse architectures. InProceedings of the International Confer- ence on High Performance Computing, Networking, Storage and Analysis, SC ’13, New York, NY, USA, 2013. Association for Computing Machin- ery

  18. [18]

    Ganger, and Phillip B

    Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, and Phillip B. Gibbons. Proteus: agile ml elasticity through tiered reliability in dynamic resource markets. InProceedings of the Twelfth European Conference on Computer Systems, EuroSys ’17, page 589–604. ACM, April 2017

  19. [19]

    Joseph, Randy Katz, Scott Shenker, and Ion Stoica

    Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, An- thony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems De- sign and Implementation, NSDI’11, page 295–308, USA, 2011. USENIX Association

  20. [20]

    Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J

    Syed M. Iqbal, Haley Li, Shane Bergsma, Ivan Beschastnikh, and Alan J. Hu. Cospot: a cooperative vm allocation framework for increased revenue from spot instances. InProceedings of the 13th Symposium on Cloud Computing, SoCC ’22, page 540–556, New York, NY, USA, 2022. Association for Computing Machinery

  21. [21]

    The price is (not) right: Reflec- tions on pricing for transient cloud servers

    David Irwin, Prashant Shenoy, Pradeep Ambati, Prateek Sharma, Supreeth Shastri, and Ahmed Ali-Eldin. The price is (not) right: Reflec- tions on pricing for transient cloud servers. In2019 28th International Conference on Computer Communication and Networks (ICCCN), pages 1–9, 2019

  22. [22]

    Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. Sia: Heterogeneity-aware, goodput-optimized ml-cluster scheduling. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 642–657, New York, NY, USA, 2023. Association for Computing Machinery

  23. [23]

    A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro

    Beomyeol Jeon, Chen Wang, Diana Arroyo, Alaa Youssef, and Indranil Gupta. A house united within itself: Slo-awareness for on-premises containerized ml inference clusters via faro. InProceedings of the Twentieth European Conference on Computer Systems, EuroSys ’25, page 524–540. ACM, March 2025

  24. [24]

    Lambda: The deep learning company.https://www

    Lambda Labs. Lambda: The deep learning company.https://www. lambdalabs.com

  25. [25]

    Flux: Unifying heterogeneous infrastructure for alibaba analyticdb

    Wei Li, Jiachi Zhang, Ye Yin, Yan Li, Zhanyang Zhu, Yuhao Li, Zhen- can Peng, Lan Lu, Wenchao Zhou, Liang Lin, and Feifei Li. Flux: Unifying heterogeneous infrastructure for alibaba analyticdb. InCom- panion of the 2025 International Conference on Management of Data, SIGMOD/PODS ’25, page 539–552. ACM, June 2025

  26. [26]

    Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism

    Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, and Minjia Zhang. Universal checkpointing: a flexible and efficient distributed checkpointing system for large-scale dnn training with reconfigurable parallelism. InProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’25, USA, ...

  27. [27]

    Themis: Fair and efficient GPU cluster scheduling

    Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Associa- tion

  28. [28]

    Microsoft, January 2026

    Microsoft.Microsoft Volume Licensing Service Level Agreement for Mi- crosoft Online Services (Worldwide English, January 1, 2026). Microsoft, January 2026. PDF; filename indicates document ID SLA5280

  29. [29]

    Heet: Accelerating elastic training in heterogeneous deep learning clusters

    Zizhao Mo, Huanle Xu, and Chengzhong Xu. Heet: Accelerating elastic training in heterogeneous deep learning clusters. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’24, page 499–513, New York, NY, USA, 2024. Association for Computing Machinery. 13

  30. [30]

    Ras: Continuously optimized region-wide dat- acenter resource allocation

    Andrew Newell, Dimitrios Skarlatos, Jingyuan Fan, Pavan Kumar, Maxim Khutornenko, Mayank Pundir, Yirui Zhang, Mingjun Zhang, Yuanlai Liu, Linh Le, Brendon Daugherty, Apurva Samudra, Prashasti Baid, James Kneeland, Igor Kabiljo, Dmitry Shchukin, Andre Ro- drigues, Scott Michelson, Ben Christensen, Kaushik Veeraraghavan, and Chunqiang Tang. Ras: Continuousl...

  31. [31]

    Nvidia dynamo documentation.https://docs

    NVIDIA Corporation. Nvidia dynamo documentation.https://docs. nvidia.com/dynamo/index.html, 2025. Accessed: 2025-12-11

  32. [32]

    NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

    NVIDIA Corporation. NVIDIA Parabricks: GPU-accelerated genomics pipelines, 2025

  33. [33]

    Spar- row: distributed, low latency scheduling

    Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Spar- row: distributed, low latency scheduling. InProceedings of the Twenty- Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, page 69–84. ACM, November 2013

  34. [34]

    Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving

    Haoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, and Rodrigo Fonseca. Modserve: Modality- and stage-aware resource disaggregation for scalable multimodal model serving. InProceedings of the 2025 ACM Symposium on Cloud Computing (SoCC 2025), New York...

  35. [35]

    Stratus: Clouds with microar- chitectural resource management

    Kaveh Razavi and Animesh Trivedi. Stratus: Clouds with microar- chitectural resource management. In12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20). USENIX Association, July 2020

  36. [36]

    Aws pricing calculator.https://calculator.aws/ #/, 2025

    Amazon Web Services. Aws pricing calculator.https://calculator.aws/ #/, 2025. Web-based cost estimation tool for Amazon Web Services

  37. [37]

    Parkes, Alvin AuYoung, Alex C

    Jeffrey Shneidman, Chaki Ng, David C. Parkes, Alvin AuYoung, Alex C. Snoeren, Amin Vahdat, and Brent Chun. Why markets could (but don’t currently) solve resource allocation problems in systems. In Proceedings of the 10th Conference on Hot Topics in Operating Systems - Volume 10, HOTOS’05, page 7, USA, 2005. USENIX Association

  38. [38]

    Ecovisor: A virtual energy sys- tem for carbon-efficient applications

    Abel Souza, Noman Bashir, Jorge Murillo, Walid Hanafy, Qianlin Liang, David Irwin, and Prashant Shenoy. Ecovisor: A virtual energy sys- tem for carbon-efficient applications. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 2, ASPLOS 2023, page 252–265, New York, NY, ...

  39. [39]

    Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal- and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS ’2...

  40. [40]

    Association for Computing Machinery

  41. [41]

    Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. Dynamollm: Designing llm inference clusters for perfor- mance and energy efficiency, 2024

  42. [42]

    Orion: Interference- aware, fine-grained gpu sharing for ml applications

    Foteini Strati, Xianzhe Ma, and Ana Klimovic. Orion: Interference- aware, fine-grained gpu sharing for ml applications. InProceedings of the Nineteenth European Conference on Computer Systems, EuroSys ’24, page 1075–1092, New York, NY, USA, 2024. Association for Computing Machinery

  43. [43]

    Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters

    Foteini Strati, Zhendong Zhang, George Manos, Ixeia Sánchez Périz, Qinghao Hu, Tiancheng Chen, Berk Buzcu, Song Han, Pamela Del- gado, and Ana Klimovic. Sailor: Automating distributed training over dynamic, heterogeneous, and geo-distributed clusters. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, SOSP ’25, page 204–220. A...

  44. [44]

    Crusoe cloud.https://crusoe.ai

    Crusoe Energy Systems. Crusoe cloud.https://crusoe.ai

  45. [45]

    Twine: A unified cluster management system for shared infrastructure

    Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. Twine: A unified clust...

  46. [46]

    Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes

    Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppen- heimer, Eric Tune, and John Wilkes. Large-scale cluster management at google with borg. InProceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015

  47. [47]

    Karma: Resource allocation for dynamic demands

    Midhul Vuppalapati, Giannis Fikioris, Rachit Agarwal, Asaf Cidon, Anurag Khandelwal, and Éva Tardos. Karma: Resource allocation for dynamic demands. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 645–662, Boston, MA, July

  48. [48]

    Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections

    Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, and Peter Pietzuch. Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections. InProceedings of the ACM SIGOPS 30th Sympo- sium on Operating Systems Principles, SOSP ’24, page 195–210. ACM, November 2024

  49. [49]

    Martínez

    Xiaodong Wang and José F. Martínez. Xchange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 113–125, 2015

  50. [50]

    Can’t be late: Optimizing spot instance savings under deadlines

    Zhanghao Wu, Wei-Lin Chiang, Ziming Mao, Zongheng Yang, Eric Friedman, Scott Shenker, and Ion Stoica. Can’t be late: Optimizing spot instance savings under deadlines. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 185–203, Santa Clara, CA, April 2024. USENIX Association

  51. [51]

    Gödel: Unified large-scale resource management and scheduling at bytedance

    Wu Xiang, Yakun Li, Yuquan Ren, Fan Jiang, Chaohui Xin, Varun Gupta, Chao Xiang, Xinyi Song, Meng Liu, Bing Li, Kaiyang Shao, Chen Xu, Wei Shao, Yuqi Fu, Wilson Wang, Cong Xu, Wei Xu, Caixue Lin, Rui Shi, and Yuming Liang. Gödel: Unified large-scale resource management and scheduling at bytedance. InProceedings of the 2023 ACM Symposium on Cloud Computing...

  52. [52]

    Gandiva: Introspective cluster scheduling for deep learning

    Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

  53. [53]

    Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, and Benjamin C. Lee. Carbon responder: Coordinating demand response for the datacenter fleet, 2023

  54. [54]

    SkyPilot: An intercloud broker for sky computing

    Zongheng Yang, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, Frank Sifei Luan, Gautam Mittal, Scott Shenker, and Ion Stoica. SkyPilot: An intercloud broker for sky computing. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 437–455, Boston, MA, April 2023. USENIX Association

  55. [55]

    Zeus: Under- standing and optimizing GPU energy consumption of DNN training

    Jie You, Jae-Won Chung, and Mosharaf Chowdhury. Zeus: Under- standing and optimizing GPU energy consumption of DNN training. In20th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 23), pages 119–139, Boston, MA, April 2023. USENIX Association

  56. [56]

    SHEP- HERD: Serving DNNs in the wild

    Hong Zhang, Yupeng Tang, Anurag Khandelwal, and Ion Stoica. SHEP- HERD: Serving DNNs in the wild. In20th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 23), pages 787–808, Boston, MA, April 2023. USENIX Association

  57. [57]

    Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

    Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In20th USENIX Symposium 14 on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association

  58. [58]

    Zhiheng Zhong and Rajkumar Buyya. A cost-efficient container or- chestration strategy in kubernetes-based cloud computing infrastruc- tures with heterogeneous resources.ACM Transactions on Internet Technology, 20(2):1–24, April 2020

  59. [59]

    Rao, Aadharsh Kannan, and R

    Çınar Kilcioglu, Justin M. Rao, Aadharsh Kannan, and R. Preston McAfee. Usage patterns and the economics of the public cloud. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, 2017. Includes empirical analysis of utilization in public cloud systems. 15