Throughput-Optimal Multiresource-Job Scheduling with Continuous Requirement Distribution
Pith reviewed 2026-05-22 08:30 UTC · model grok-4.3
The pith
Throughput-optimal scheduling policies exist for multiresource jobs with continuously distributed resource requirements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions.
What carries the argument
Discretization of the continuous resource-requirement distribution into a finite number of classes, with granularity chosen from system load and the requirement distribution to preserve throughput optimality.
If this is right
- Throughput optimality holds for both preemptive and non-preemptive policies in the continuous MRJ model.
- More computationally efficient policy families remain throughput-optimal when the requirement distribution satisfies additional assumptions.
- The policies achieve state-of-the-art performance on parametrized continuous distributions and on Google Borg datacenter traces.
- The discretization technique directly bridges class-based theoretical models to the continuous distributions observed in production systems.
Where Pith is reading between the lines
- The same discretization idea could be tested on other resource-allocation problems that currently assume only a few discrete classes.
- Adaptive versions that adjust bin width on the fly from observed load might further reduce overhead while keeping optimality.
- The approach suggests that many existing index policies could be made optimal for continuous settings by applying a similar load-dependent discretization step.
Load-bearing premise
The chosen discretization granularity is fine enough to preserve throughput optimality when the underlying model is truly continuous rather than already discrete.
What would settle it
A workload whose resource requirements follow a continuous distribution in which the proposed policy's long-run throughput falls measurably below the maximum feasible rate that would be achieved by an ideal fluid scheduler.
read the original abstract
Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is rare to see the same value twice. This pattern is best modeled by a continuous distribution of requirement values. However, the existing theoretical work on stability or throughput-optimality focuses on queueing models with class-based resource requirements. In class-based models, the number of distinct resource requirements must be small to demonstrate strong empirical performance, making them a poor match for these practical systems. We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We validate the real-world applicability of our policies by comparing them against existing index-based policies on parametrized distributions and on datacenter trace data from the Google Borg scheduler, demonstrating state-of-the-art performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the first throughput-optimal family of scheduling policies for the continuous multiresource-job (MRJ) model, using a discretization approach whose granularity is selected from the system load and the resource-requirement distribution. Both preemptive and nonpreemptive variants are presented, together with computationally efficient sub-families that remain optimal under additional distributional assumptions. The policies are validated against index-based baselines on parametrized distributions and on Google Borg trace data.
Significance. If the stability argument extends rigorously to arbitrary continuous distributions, the result would close an important gap between existing class-based throughput-optimality theory and the continuous requirement distributions observed in production systems. The load- and distribution-dependent discretization rule, the provision of both preemptive and nonpreemptive policies, and the empirical evaluation on real datacenter traces are concrete strengths that would make the work influential for both theory and practice.
major comments (2)
- [Section on policy construction and stability analysis] Section on policy construction and stability analysis: the central claim that a policy optimal on the finite discretization remains throughput-optimal for the original continuous model requires explicit uniform error bounds showing that the mapping-induced residual drift vanishes in the fluid limit. The current argument appears to establish stability only for the discretized system and then invokes a continuity step; without these bounds the exact optimality result for arbitrary continuous distributions is not yet established.
- [Theorem or proposition establishing throughput optimality] Theorem or proposition establishing throughput optimality: the proof must demonstrate that the chosen discretization granularity (derived from load and the requirement distribution) preserves the capacity region boundary for every distribution in the considered class; a concrete counter-example or a uniform convergence argument is needed to rule out distributions for which the mapping error leaves a positive drift outside the region.
minor comments (2)
- [Policy construction] Clarify in the policy-construction section exactly how the discretization points are chosen as a function of the load vector and the cumulative distribution function; the current description leaves the selection rule implicit.
- [Trace validation] In the trace-validation experiments, report the number of independent runs or bootstrap confidence intervals for the throughput and delay metrics so that the claimed state-of-the-art performance can be assessed statistically.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments correctly identify that the current stability argument would benefit from more explicit bounds to rigorously close the gap between the discretized and continuous models. We address each point below and will revise the manuscript to incorporate the requested strengthening of the proofs.
read point-by-point responses
-
Referee: Section on policy construction and stability analysis: the central claim that a policy optimal on the finite discretization remains throughput-optimal for the original continuous model requires explicit uniform error bounds showing that the mapping-induced residual drift vanishes in the fluid limit. The current argument appears to establish stability only for the discretized system and then invokes a continuity step; without these bounds the exact optimality result for arbitrary continuous distributions is not yet established.
Authors: We agree that the continuity step requires additional justification to be fully rigorous. The existing argument establishes fluid stability for the finite-class discretized system via standard Lyapunov-drift techniques and then appeals to continuity of the mapping. In the revision we will insert a new lemma that derives explicit uniform error bounds on the residual drift term induced by the discretization. These bounds will be expressed in terms of the chosen granularity (itself a function of load and the requirement distribution) and will show that the error is o(1) under fluid scaling, ensuring the drift remains strictly negative outside the capacity region for the original continuous model. This addition will confirm throughput optimality for arbitrary continuous distributions satisfying the paper's moment assumptions. revision: yes
-
Referee: Theorem or proposition establishing throughput optimality: the proof must demonstrate that the chosen discretization granularity (derived from load and the requirement distribution) preserves the capacity region boundary for every distribution in the considered class; a concrete counter-example or a uniform convergence argument is needed to rule out distributions for which the mapping error leaves a positive drift outside the region.
Authors: We acknowledge the need for an explicit demonstration that the capacity-region boundary is preserved. The granularity rule is constructed precisely so that the discretization error in the service-rate vectors is controlled uniformly by the load and the variation of the requirement distribution. In the revised version we will add a uniform convergence argument (based on the continuity of the expectation operator under the chosen partition) showing that the approximated capacity region converges to the true region at a rate sufficient to keep any residual drift negative outside the boundary. This argument applies uniformly over the class of continuous distributions with finite second moments, thereby ruling out distributions that could produce a persistent positive drift. Should the referee have a specific counter-example distribution in mind, we would be grateful for the details so that we may verify the bound directly. revision: yes
Circularity Check
No significant circularity detected in the derivation chain.
full rationale
The paper derives throughput-optimal policies for the continuous MRJ model via a discretization approach that selects granularity from system load and the resource-requirement distribution. The central claim of optimality for the continuous model is supported by stability analysis that maps the continuous system to a discrete one while preserving the capacity region boundary. No quoted equations or self-citations reduce the optimality result to a fitted parameter, self-definition, or prior author work by construction. The discretization rule is presented as derived from load and distributional properties rather than assumed, and the proof chain remains independent of the target result. This is the expected self-contained case for a paper introducing a new policy family with explicit construction and analysis steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Resource requirements follow a continuous distribution that can be discretized at a granularity chosen from system load and the distribution itself.
Reference graph
Works this paper leans on
-
[1]
Operations research 28(6), 1335–1346 (1980)
Green, L.: A queueing system in which customers require a random number of servers. Operations research 28(6), 1335–1346 (1980)
work page 1980
-
[2]
Mana gement Science 30(1), 51–68 (1984)
Brill, P.H., Green, L.: Queues in which customers receive simultaneou s service from a random number of servers: a system point approach. Mana gement Science 30(1), 51–68 (1984)
work page 1984
-
[3]
IEEE Transactions on Information theory 39(2), 466–478 (1993)
Tassiulas, L., Ephremides, A.: Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE Transactions on Information theory 39(2), 466–478 (1993)
work page 1993
-
[4]
Algorithmica 29(1), 70–88 (2001) 45
Coffman, E., Stolyar, A.L.: Bandwidth packing. Algorithmica 29(1), 70–88 (2001) 45
work page 2001
-
[5]
The Annals of Applied Probab ility 14(1), 1–53 (2004)
Stolyar, A.L.: Maxweight scheduling in a generalized switch: State s pace collapse and workload minimization in heavy traffic. The Annals of Applied Probab ility 14(1), 1–53 (2004)
work page 2004
-
[6]
In: 2012 Proceedings IEEE I nfocom, pp
Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balanc ing and scheduling in cloud computing clusters. In: 2012 Proceedings IEEE I nfocom, pp. 702–710 (2012). IEEE
work page 2012
-
[7]
Annals of Operations Research 252(1), 29–39 (2017)
Rumyantsev, A., Morozov, E.: Stability criterion of a multiserver m odel with simultaneous service. Annals of Operations Research 252(1), 29–39 (2017)
work page 2017
-
[8]
IEEE/ACM Transactions on Networking 26(5), 2202–2215 (2018)
Psychas, K., Ghaderi, J.: Randomized algorithms for scheduling mu lti-resource jobs in the cloud. IEEE/ACM Transactions on Networking 26(5), 2202–2215 (2018)
work page 2018
-
[9]
Queueing Systems 102(1), 143–174 (2022)
Grosof, I., Harchol-Balter, M., Scheller-Wolf, A.: WCFS: A new fra mework for analyzing multiserver systems. Queueing Systems 102(1), 143–174 (2022)
work page 2022
-
[10]
Proceedings of t he ACM on Measurement and Analysis of Computing Systems 6(3), 1–32 (2022)
Grosof, I., Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: Optim al scheduling in the multiserver-job model under heavy traffic. Proceedings of t he ACM on Measurement and Analysis of Computing Systems 6(3), 1–32 (2022)
work page 2022
-
[11]
Perf ormance Evaluation 162, 102378 (2023)
Grosof, I., Hong, Y., Harchol-Balter, M., Scheller-Wolf, A.: The R ESET and MARC techniques, with application to multiserver-job analysis. Perf ormance Evaluation 162, 102378 (2023)
work page 2023
-
[12]
Hong, Y., Wang, W.: Sharp waiting-time bounds for multiserver jo bs. In: Pro- ceedings of the Twenty-Third International Symposium on Theory , Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Co mputing, pp. 161–170 (2022)
work page 2022
-
[13]
Performance Evaluation 171, 102525 (2026) https://doi.org/10.1016/j.peva.2025.102525
Chen, Z., Anggraito, A., Olliaro, D., Marin, A., Marsan, M.A., Berg, B., Grosof, I.: Improving nonpreemptive multiserver job schedul- ing with quickswap. Performance Evaluation 171, 102525 (2026) https://doi.org/10.1016/j.peva.2025.102525
-
[14]
Pro- ceedings of the ACM on Measurement and Analysis of Computing Syst ems 1(2), 1–29 (2017)
Psychas, K., Ghaderi, J.: On non-preemptive VM scheduling in the cloud. Pro- ceedings of the ACM on Measurement and Analysis of Computing Syst ems 1(2), 1–29 (2017)
work page 2017
-
[15]
Proceedings of the ACM on Measurement a nd Analysis of Computing Systems 9(2), 1–36 (2025)
Chen, Z., Grosof, I., Berg, B.: Improving multiresource job sch eduling with marko- vian service rate policies. Proceedings of the ACM on Measurement a nd Analysis of Computing Systems 9(2), 1–36 (2025)
work page 2025
-
[16]
In: Proceedings o f the Fifteenth European Conference on Computer Systems, pp
Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Ha rchol- Balter, M., Wilkes, J.: Borg: the next generation. In: Proceedings o f the Fifteenth European Conference on Computer Systems, pp. 1–14 (2020) 46
work page 2020
-
[17]
In: 2024 ACM/IEEE 51st Annual International Symposium on Computer Arc hitecture (ISCA), pp
Patel, P., Choukse, E., Zhang, C., Shah, A., Goiri, ´I., Maleki, S., Bianchini, R.: Splitwise: Efficient generative LLM inference using phase splitting. In: 2024 ACM/IEEE 51st Annual International Symposium on Computer Arc hitecture (ISCA), pp. 118–132 (2024). IEEE
work page 2024
-
[18]
In: Procee dings of the Third ACM Symposium on Cloud Computing, pp
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Het erogeneity and dynamicity of clouds at scale: Google trace analysis. In: Procee dings of the Third ACM Symposium on Cloud Computing, pp. 1–13 (2012)
work page 2012
-
[19]
Verma, A., Pedrosa, L., Korupolu, M.R., Oppenheimer, D., Tune, E ., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Pro- ceedings of the European Conference on Computer Systems (Eur oSys’15), Bordeaux, France (2015). https://doi.org/10.1145/2741948.2741964 . https://dl.acm.org/doi/10.1145/2741948.2741964
-
[20]
Google research blog, Mountain View, CA, USA
Wilkes, J.: Yet more Google compute cluster trace data. Google research blog, Mountain View, CA, USA. Posted at https://ai.googleblog.com/2020/04/yet-more-google-compute- cluster-trace.html. (2020)
work page 2020
-
[21]
Technical report, Google Inc., Mountain View, CA, USA (April 2020)
Wilkes, J.: Google cluster-usage traces v3. Technical report, Google Inc., Mountain View, CA, USA (April 2020). Posted at https://github.com/google/cluster-data/blob/master/Cluster Data2019.md
work page 2020
-
[22]
Google research blog, Mountain View, CA, USA
Wilkes, J.: More Google cluster data. Google research blog, Mountain View, CA, USA. Posted at http://googleresearch.blogspot.com/2011/11/more-google-clu ster-data.html. (2011)
work page 2011
-
[23]
Technical report, Google Inc., Mountain View, CA, USA (November 2011)
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage trace s: for- mat + schema. Technical report, Google Inc., Mountain View, CA, USA (November 2011). Revised 2014-11-17 for version 2.1. Poste d at https://github.com/google/cluster-data
work page 2011
-
[24]
In: Proceedings o f the 8th ACM European Conference on Computer Systems, pp
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omeg a: flexible, scalable schedulers for large compute clusters. In: Proceedings o f the 8th ACM European Conference on Computer Systems, pp. 351–364 (2013 )
work page 2013
-
[25]
In: Proceedings of the International Symposium on Quality of Serv ice, pp
Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., Bao, Y.: Wh o limits the resource efficiency of my datacenter: An analysis of Alibaba dat acenter traces. In: Proceedings of the International Symposium on Quality of Serv ice, pp. 1–10 (2019)
work page 2019
-
[26]
In: Proceedings of the 4th International Con ference on Big Data and Computing, pp
Li, F., Hu, B.: Deepjs: Job scheduling based on deep reinforceme nt learning in cloud data center. In: Proceedings of the 4th International Con ference on Big Data and Computing, pp. 48–53 (2019) 47
work page 2019
-
[27]
In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp
Wu, H., Zhang, W., Xu, Y., Xiang, H., Huang, T., Ding, H., Zhang, Z.: A laddin: Optimized maximum flow management for shared production clusters . In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 696–707 (2019). IEEE
work page 2019
-
[28]
arXiv preprint arXiv:2509.08207 (2025)
Allen, B.S., Anchell, J., Anisimov, V., Applencourt, T., Bagusetty, A ., Balakr- ishnan, R., Balin, R., Bekele, S., Bertoni, C., Blackworth, C., et al.: Auro ra: Architecting argonne’s first exascale supercomputer for acceler ated scientific discovery. arXiv preprint arXiv:2509.08207 (2025)
-
[29]
In: In ternational Conference on High Performance Computing, pp
Budiardja, R.D., Berrill, M., Eisenbach, M., Jansen, G.R., Joubert, W., Nichols, S., Rogers, D.M., Tharrington, A., Bronson Messer, O.: Ready for th e frontier: Preparing applications for the world’s first exascale system. In: In ternational Conference on High Performance Computing, pp. 182–201 (2023) . Springer
work page 2023
-
[30]
: Fron- tier: exploring exascale
Atchley, S., Zimmer, C., Lange, J., Bernholdt, D., Melesse Vergar a, V., Beck, T., Brim, M., Budiardja, R., Chandrasekaran, S., Eisenbach, M., et al. : Fron- tier: exploring exascale. In: Proceedings of the International Co nference for High Performance Computing, Networking, Storage and Analysis, pp. 1 –16 (2023)
work page 2023
-
[31]
In: Proc eedings of the Fifteenth European Conference on Computer Systems (Eu roSys’20)
Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Harchol-Balter, M., Wilkes, J.: Borg: the Next Generation. In: Proc eedings of the Fifteenth European Conference on Computer Systems (Eu roSys’20). ACM, Heraklion, Greece (2020). https://doi.org/10.1145/3342195.3387517 . https://doi.org/10.1145/3342195.3387517
-
[32]
Proceedings of the London Mathematical Society 2(1), 75–115 (1918)
Hardy, G.H., Ramanujan, S.: Asymptotic formulae in combinatory analysis. Proceedings of the London Mathematical Society 2(1), 75–115 (1918)
work page 1918
-
[33]
Experimental Mathematics 32(3), 457–466 (2023)
Shlyk, V.A.: Number of vertices of the polytope of integer partit ions and fac- torization of the partitioned number. Experimental Mathematics 32(3), 457–466 (2023)
work page 2023
-
[34]
Advances in Applied Proba bility 25(3), 518–548 (1993)
Meyn, S.P., Tweedie, R.L.: Stability of markovian processes III: F oster–Lyapunov criteria for continuous-time processes. Advances in Applied Proba bility 25(3), 518–548 (1993)
work page 1993
-
[35]
Meyn, S.P., Tweedie, R.: A survey of Foster-Lyapunov techniqu es for general state space markov processes. In: Proceedings of the Workshop on St ochastic Stability and Stochastic Stabilization, Metz, France (1993)
work page 1993
-
[36]
In: Markov Processes and Related Topics: a Festschrift for Thom as G
Glynn, P.W., Zeevi, A.: Bounding stationary expectations of mark ov processes. In: Markov Processes and Related Topics: a Festschrift for Thom as G. Kurtz vol. 4, pp. 195–215. Institute of Mathematical Statistics, ??? (2008 )
work page 2008
-
[37]
In: 2024 IEEE 25th International Confe rence on High 48 Performance Switching and Routing (HPSR), pp
Yildiz, M., Baiocchi, A.: Data-driven workload generation based on google data center measurements. In: 2024 IEEE 25th International Confe rence on High 48 Performance Switching and Routing (HPSR), pp. 143–148 (2024). IEEE Appendix A List of Notation General Setting: Symbol Description λ Arrival rate d The number of resources in the MRJ setting V Resource...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.