pith. sign in

arxiv: 2605.21715 · v1 · pith:WEXLN2WTnew · submitted 2026-05-20 · 💻 cs.PF

Throughput-Optimal Multiresource-Job Scheduling with Continuous Requirement Distribution

Pith reviewed 2026-05-22 08:30 UTC · model grok-4.3

classification 💻 cs.PF
keywords multiresource job schedulingthroughput optimalitycontinuous distributiondiscretizationpreemptive schedulingnonpreemptive schedulingqueueing modelsdatacenter scheduling
0
0 comments X

The pith

Throughput-optimal scheduling policies exist for multiresource jobs with continuously distributed resource requirements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the first family of scheduling policies proven to achieve throughput optimality in the continuous multiresource job model, covering both preemptive and nonpreemptive cases. Practical systems rarely repeat exact resource values such as CPU and memory, so continuous distributions match reality far better than the small number of classes assumed in prior theory. The approach works by discretizing the continuous requirements into a finite set of classes, with the bin width selected from the system load and the underlying distribution to keep the optimality guarantee intact. Several faster-to-compute variants are also given that stay optimal under mild extra assumptions on the distribution. The policies match or exceed existing index-based methods on both synthetic continuous distributions and real Google Borg scheduler traces.

Core claim

We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions.

What carries the argument

Discretization of the continuous resource-requirement distribution into a finite number of classes, with granularity chosen from system load and the requirement distribution to preserve throughput optimality.

If this is right

  • Throughput optimality holds for both preemptive and non-preemptive policies in the continuous MRJ model.
  • More computationally efficient policy families remain throughput-optimal when the requirement distribution satisfies additional assumptions.
  • The policies achieve state-of-the-art performance on parametrized continuous distributions and on Google Borg datacenter traces.
  • The discretization technique directly bridges class-based theoretical models to the continuous distributions observed in production systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same discretization idea could be tested on other resource-allocation problems that currently assume only a few discrete classes.
  • Adaptive versions that adjust bin width on the fly from observed load might further reduce overhead while keeping optimality.
  • The approach suggests that many existing index policies could be made optimal for continuous settings by applying a similar load-dependent discretization step.

Load-bearing premise

The chosen discretization granularity is fine enough to preserve throughput optimality when the underlying model is truly continuous rather than already discrete.

What would settle it

A workload whose resource requirements follow a continuous distribution in which the proposed policy's long-run throughput falls measurably below the maximum feasible rate that would be achieved by an ideal fluid scheduler.

read the original abstract

Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is rare to see the same value twice. This pattern is best modeled by a continuous distribution of requirement values. However, the existing theoretical work on stability or throughput-optimality focuses on queueing models with class-based resource requirements. In class-based models, the number of distinct resource requirements must be small to demonstrate strong empirical performance, making them a poor match for these practical systems. We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We validate the real-world applicability of our policies by comparing them against existing index-based policies on parametrized distributions and on datacenter trace data from the Google Borg scheduler, demonstrating state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the first throughput-optimal family of scheduling policies for the continuous multiresource-job (MRJ) model, using a discretization approach whose granularity is selected from the system load and the resource-requirement distribution. Both preemptive and nonpreemptive variants are presented, together with computationally efficient sub-families that remain optimal under additional distributional assumptions. The policies are validated against index-based baselines on parametrized distributions and on Google Borg trace data.

Significance. If the stability argument extends rigorously to arbitrary continuous distributions, the result would close an important gap between existing class-based throughput-optimality theory and the continuous requirement distributions observed in production systems. The load- and distribution-dependent discretization rule, the provision of both preemptive and nonpreemptive policies, and the empirical evaluation on real datacenter traces are concrete strengths that would make the work influential for both theory and practice.

major comments (2)
  1. [Section on policy construction and stability analysis] Section on policy construction and stability analysis: the central claim that a policy optimal on the finite discretization remains throughput-optimal for the original continuous model requires explicit uniform error bounds showing that the mapping-induced residual drift vanishes in the fluid limit. The current argument appears to establish stability only for the discretized system and then invokes a continuity step; without these bounds the exact optimality result for arbitrary continuous distributions is not yet established.
  2. [Theorem or proposition establishing throughput optimality] Theorem or proposition establishing throughput optimality: the proof must demonstrate that the chosen discretization granularity (derived from load and the requirement distribution) preserves the capacity region boundary for every distribution in the considered class; a concrete counter-example or a uniform convergence argument is needed to rule out distributions for which the mapping error leaves a positive drift outside the region.
minor comments (2)
  1. [Policy construction] Clarify in the policy-construction section exactly how the discretization points are chosen as a function of the load vector and the cumulative distribution function; the current description leaves the selection rule implicit.
  2. [Trace validation] In the trace-validation experiments, report the number of independent runs or bootstrap confidence intervals for the throughput and delay metrics so that the claimed state-of-the-art performance can be assessed statistically.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify that the current stability argument would benefit from more explicit bounds to rigorously close the gap between the discretized and continuous models. We address each point below and will revise the manuscript to incorporate the requested strengthening of the proofs.

read point-by-point responses
  1. Referee: Section on policy construction and stability analysis: the central claim that a policy optimal on the finite discretization remains throughput-optimal for the original continuous model requires explicit uniform error bounds showing that the mapping-induced residual drift vanishes in the fluid limit. The current argument appears to establish stability only for the discretized system and then invokes a continuity step; without these bounds the exact optimality result for arbitrary continuous distributions is not yet established.

    Authors: We agree that the continuity step requires additional justification to be fully rigorous. The existing argument establishes fluid stability for the finite-class discretized system via standard Lyapunov-drift techniques and then appeals to continuity of the mapping. In the revision we will insert a new lemma that derives explicit uniform error bounds on the residual drift term induced by the discretization. These bounds will be expressed in terms of the chosen granularity (itself a function of load and the requirement distribution) and will show that the error is o(1) under fluid scaling, ensuring the drift remains strictly negative outside the capacity region for the original continuous model. This addition will confirm throughput optimality for arbitrary continuous distributions satisfying the paper's moment assumptions. revision: yes

  2. Referee: Theorem or proposition establishing throughput optimality: the proof must demonstrate that the chosen discretization granularity (derived from load and the requirement distribution) preserves the capacity region boundary for every distribution in the considered class; a concrete counter-example or a uniform convergence argument is needed to rule out distributions for which the mapping error leaves a positive drift outside the region.

    Authors: We acknowledge the need for an explicit demonstration that the capacity-region boundary is preserved. The granularity rule is constructed precisely so that the discretization error in the service-rate vectors is controlled uniformly by the load and the variation of the requirement distribution. In the revised version we will add a uniform convergence argument (based on the continuity of the expectation operator under the chosen partition) showing that the approximated capacity region converges to the true region at a rate sufficient to keep any residual drift negative outside the boundary. This argument applies uniformly over the class of continuous distributions with finite second moments, thereby ruling out distributions that could produce a persistent positive drift. Should the referee have a specific counter-example distribution in mind, we would be grateful for the details so that we may verify the bound directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain.

full rationale

The paper derives throughput-optimal policies for the continuous MRJ model via a discretization approach that selects granularity from system load and the resource-requirement distribution. The central claim of optimality for the continuous model is supported by stability analysis that maps the continuous system to a discrete one while preserving the capacity region boundary. No quoted equations or self-citations reduce the optimality result to a fitted parameter, self-definition, or prior author work by construction. The discretization rule is presented as derived from load and distributional properties rather than assumed, and the proof chain remains independent of the target result. This is the expected self-contained case for a paper introducing a new policy family with explicit construction and analysis steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling choice that resource requirements are drawn from a continuous distribution and on the claim that a load-dependent discretization preserves throughput optimality.

axioms (1)
  • domain assumption Resource requirements follow a continuous distribution that can be discretized at a granularity chosen from system load and the distribution itself.
    Invoked to justify moving from class-based to continuous MRJ models and to select discretization.

pith-pipeline@v0.9.0 · 5743 in / 1161 out tokens · 55365 ms · 2026-05-22T08:30:26.295130+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Operations research 28(6), 1335–1346 (1980)

    Green, L.: A queueing system in which customers require a random number of servers. Operations research 28(6), 1335–1346 (1980)

  2. [2]

    Mana gement Science 30(1), 51–68 (1984)

    Brill, P.H., Green, L.: Queues in which customers receive simultaneou s service from a random number of servers: a system point approach. Mana gement Science 30(1), 51–68 (1984)

  3. [3]

    IEEE Transactions on Information theory 39(2), 466–478 (1993)

    Tassiulas, L., Ephremides, A.: Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE Transactions on Information theory 39(2), 466–478 (1993)

  4. [4]

    Algorithmica 29(1), 70–88 (2001) 45

    Coffman, E., Stolyar, A.L.: Bandwidth packing. Algorithmica 29(1), 70–88 (2001) 45

  5. [5]

    The Annals of Applied Probab ility 14(1), 1–53 (2004)

    Stolyar, A.L.: Maxweight scheduling in a generalized switch: State s pace collapse and workload minimization in heavy traffic. The Annals of Applied Probab ility 14(1), 1–53 (2004)

  6. [6]

    In: 2012 Proceedings IEEE I nfocom, pp

    Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balanc ing and scheduling in cloud computing clusters. In: 2012 Proceedings IEEE I nfocom, pp. 702–710 (2012). IEEE

  7. [7]

    Annals of Operations Research 252(1), 29–39 (2017)

    Rumyantsev, A., Morozov, E.: Stability criterion of a multiserver m odel with simultaneous service. Annals of Operations Research 252(1), 29–39 (2017)

  8. [8]

    IEEE/ACM Transactions on Networking 26(5), 2202–2215 (2018)

    Psychas, K., Ghaderi, J.: Randomized algorithms for scheduling mu lti-resource jobs in the cloud. IEEE/ACM Transactions on Networking 26(5), 2202–2215 (2018)

  9. [9]

    Queueing Systems 102(1), 143–174 (2022)

    Grosof, I., Harchol-Balter, M., Scheller-Wolf, A.: WCFS: A new fra mework for analyzing multiserver systems. Queueing Systems 102(1), 143–174 (2022)

  10. [10]

    Proceedings of t he ACM on Measurement and Analysis of Computing Systems 6(3), 1–32 (2022)

    Grosof, I., Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: Optim al scheduling in the multiserver-job model under heavy traffic. Proceedings of t he ACM on Measurement and Analysis of Computing Systems 6(3), 1–32 (2022)

  11. [11]

    Perf ormance Evaluation 162, 102378 (2023)

    Grosof, I., Hong, Y., Harchol-Balter, M., Scheller-Wolf, A.: The R ESET and MARC techniques, with application to multiserver-job analysis. Perf ormance Evaluation 162, 102378 (2023)

  12. [12]

    In: Pro- ceedings of the Twenty-Third International Symposium on Theory , Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Co mputing, pp

    Hong, Y., Wang, W.: Sharp waiting-time bounds for multiserver jo bs. In: Pro- ceedings of the Twenty-Third International Symposium on Theory , Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Co mputing, pp. 161–170 (2022)

  13. [13]

    Performance Evaluation 171, 102525 (2026) https://doi.org/10.1016/j.peva.2025.102525

    Chen, Z., Anggraito, A., Olliaro, D., Marin, A., Marsan, M.A., Berg, B., Grosof, I.: Improving nonpreemptive multiserver job schedul- ing with quickswap. Performance Evaluation 171, 102525 (2026) https://doi.org/10.1016/j.peva.2025.102525

  14. [14]

    Pro- ceedings of the ACM on Measurement and Analysis of Computing Syst ems 1(2), 1–29 (2017)

    Psychas, K., Ghaderi, J.: On non-preemptive VM scheduling in the cloud. Pro- ceedings of the ACM on Measurement and Analysis of Computing Syst ems 1(2), 1–29 (2017)

  15. [15]

    Proceedings of the ACM on Measurement a nd Analysis of Computing Systems 9(2), 1–36 (2025)

    Chen, Z., Grosof, I., Berg, B.: Improving multiresource job sch eduling with marko- vian service rate policies. Proceedings of the ACM on Measurement a nd Analysis of Computing Systems 9(2), 1–36 (2025)

  16. [16]

    In: Proceedings o f the Fifteenth European Conference on Computer Systems, pp

    Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Ha rchol- Balter, M., Wilkes, J.: Borg: the next generation. In: Proceedings o f the Fifteenth European Conference on Computer Systems, pp. 1–14 (2020) 46

  17. [17]

    In: 2024 ACM/IEEE 51st Annual International Symposium on Computer Arc hitecture (ISCA), pp

    Patel, P., Choukse, E., Zhang, C., Shah, A., Goiri, ´I., Maleki, S., Bianchini, R.: Splitwise: Efficient generative LLM inference using phase splitting. In: 2024 ACM/IEEE 51st Annual International Symposium on Computer Arc hitecture (ISCA), pp. 118–132 (2024). IEEE

  18. [18]

    In: Procee dings of the Third ACM Symposium on Cloud Computing, pp

    Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Het erogeneity and dynamicity of clouds at scale: Google trace analysis. In: Procee dings of the Third ACM Symposium on Cloud Computing, pp. 1–13 (2012)

  19. [19]

    In: Pro- ceedings of the European Conference on Computer Systems (Eur oSys’15), Bordeaux, France (2015)

    Verma, A., Pedrosa, L., Korupolu, M.R., Oppenheimer, D., Tune, E ., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Pro- ceedings of the European Conference on Computer Systems (Eur oSys’15), Bordeaux, France (2015). https://doi.org/10.1145/2741948.2741964 . https://dl.acm.org/doi/10.1145/2741948.2741964

  20. [20]

    Google research blog, Mountain View, CA, USA

    Wilkes, J.: Yet more Google compute cluster trace data. Google research blog, Mountain View, CA, USA. Posted at https://ai.googleblog.com/2020/04/yet-more-google-compute- cluster-trace.html. (2020)

  21. [21]

    Technical report, Google Inc., Mountain View, CA, USA (April 2020)

    Wilkes, J.: Google cluster-usage traces v3. Technical report, Google Inc., Mountain View, CA, USA (April 2020). Posted at https://github.com/google/cluster-data/blob/master/Cluster Data2019.md

  22. [22]

    Google research blog, Mountain View, CA, USA

    Wilkes, J.: More Google cluster data. Google research blog, Mountain View, CA, USA. Posted at http://googleresearch.blogspot.com/2011/11/more-google-clu ster-data.html. (2011)

  23. [23]

    Technical report, Google Inc., Mountain View, CA, USA (November 2011)

    Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage trace s: for- mat + schema. Technical report, Google Inc., Mountain View, CA, USA (November 2011). Revised 2014-11-17 for version 2.1. Poste d at https://github.com/google/cluster-data

  24. [24]

    In: Proceedings o f the 8th ACM European Conference on Computer Systems, pp

    Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omeg a: flexible, scalable schedulers for large compute clusters. In: Proceedings o f the 8th ACM European Conference on Computer Systems, pp. 351–364 (2013 )

  25. [25]

    In: Proceedings of the International Symposium on Quality of Serv ice, pp

    Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., Bao, Y.: Wh o limits the resource efficiency of my datacenter: An analysis of Alibaba dat acenter traces. In: Proceedings of the International Symposium on Quality of Serv ice, pp. 1–10 (2019)

  26. [26]

    In: Proceedings of the 4th International Con ference on Big Data and Computing, pp

    Li, F., Hu, B.: Deepjs: Job scheduling based on deep reinforceme nt learning in cloud data center. In: Proceedings of the 4th International Con ference on Big Data and Computing, pp. 48–53 (2019) 47

  27. [27]

    In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp

    Wu, H., Zhang, W., Xu, Y., Xiang, H., Huang, T., Ding, H., Zhang, Z.: A laddin: Optimized maximum flow management for shared production clusters . In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 696–707 (2019). IEEE

  28. [28]

    arXiv preprint arXiv:2509.08207 (2025)

    Allen, B.S., Anchell, J., Anisimov, V., Applencourt, T., Bagusetty, A ., Balakr- ishnan, R., Balin, R., Bekele, S., Bertoni, C., Blackworth, C., et al.: Auro ra: Architecting argonne’s first exascale supercomputer for acceler ated scientific discovery. arXiv preprint arXiv:2509.08207 (2025)

  29. [29]

    In: In ternational Conference on High Performance Computing, pp

    Budiardja, R.D., Berrill, M., Eisenbach, M., Jansen, G.R., Joubert, W., Nichols, S., Rogers, D.M., Tharrington, A., Bronson Messer, O.: Ready for th e frontier: Preparing applications for the world’s first exascale system. In: In ternational Conference on High Performance Computing, pp. 182–201 (2023) . Springer

  30. [30]

    : Fron- tier: exploring exascale

    Atchley, S., Zimmer, C., Lange, J., Bernholdt, D., Melesse Vergar a, V., Beck, T., Brim, M., Budiardja, R., Chandrasekaran, S., Eisenbach, M., et al. : Fron- tier: exploring exascale. In: Proceedings of the International Co nference for High Performance Computing, Networking, Storage and Analysis, pp. 1 –16 (2023)

  31. [31]

    In: Proc eedings of the Fifteenth European Conference on Computer Systems (Eu roSys’20)

    Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Harchol-Balter, M., Wilkes, J.: Borg: the Next Generation. In: Proc eedings of the Fifteenth European Conference on Computer Systems (Eu roSys’20). ACM, Heraklion, Greece (2020). https://doi.org/10.1145/3342195.3387517 . https://doi.org/10.1145/3342195.3387517

  32. [32]

    Proceedings of the London Mathematical Society 2(1), 75–115 (1918)

    Hardy, G.H., Ramanujan, S.: Asymptotic formulae in combinatory analysis. Proceedings of the London Mathematical Society 2(1), 75–115 (1918)

  33. [33]

    Experimental Mathematics 32(3), 457–466 (2023)

    Shlyk, V.A.: Number of vertices of the polytope of integer partit ions and fac- torization of the partitioned number. Experimental Mathematics 32(3), 457–466 (2023)

  34. [34]

    Advances in Applied Proba bility 25(3), 518–548 (1993)

    Meyn, S.P., Tweedie, R.L.: Stability of markovian processes III: F oster–Lyapunov criteria for continuous-time processes. Advances in Applied Proba bility 25(3), 518–548 (1993)

  35. [35]

    In: Proceedings of the Workshop on St ochastic Stability and Stochastic Stabilization, Metz, France (1993)

    Meyn, S.P., Tweedie, R.: A survey of Foster-Lyapunov techniqu es for general state space markov processes. In: Proceedings of the Workshop on St ochastic Stability and Stochastic Stabilization, Metz, France (1993)

  36. [36]

    In: Markov Processes and Related Topics: a Festschrift for Thom as G

    Glynn, P.W., Zeevi, A.: Bounding stationary expectations of mark ov processes. In: Markov Processes and Related Topics: a Festschrift for Thom as G. Kurtz vol. 4, pp. 195–215. Institute of Mathematical Statistics, ??? (2008 )

  37. [37]

    In: 2024 IEEE 25th International Confe rence on High 48 Performance Switching and Routing (HPSR), pp

    Yildiz, M., Baiocchi, A.: Data-driven workload generation based on google data center measurements. In: 2024 IEEE 25th International Confe rence on High 48 Performance Switching and Routing (HPSR), pp. 143–148 (2024). IEEE Appendix A List of Notation General Setting: Symbol Description λ Arrival rate d The number of resources in the MRJ setting V Resource...