The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture
Pith reviewed 2026-05-22 23:15 UTC · model grok-4.3
The pith
Mean job response time reaches a minimum at certain server counts under fixed computational budget
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin.
What carries the argument
Varying server count under fixed total computational budget in simulations of dispatching policies on Google-derived workloads, comparing simple policies like Join Idle Queue and Round Robin against size-based ones and multi-stage architectures.
Load-bearing premise
The traffic workloads derived from Google's measurements accurately represent the conditions under which the dispatching policies are evaluated, and the simulation does not miss important real-world factors such as communication overheads or server heterogeneity.
What would settle it
Repeating the simulations with an independent workload trace or after adding modeled communication overheads and checking whether a response-time minimum still appears at comparable server counts.
Figures
read the original abstract
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper uses discrete-event simulations driven by job-size traces derived from Google's public cloud workload measurements to evaluate dispatching policies in single- and multi-stage server clusters. Its central claims are that mean job response time exhibits a minimum when the number of servers is varied while holding total computational budget fixed; that simple policies such as Join Idle Queue match the performance of size-based policies once parallelism is sufficiently high; and that multi-stage architectures yield strictly better performance than single-stage ones even when both use elementary policies such as Round Robin.
Significance. If the reported minimum and the architecture/policy rankings survive validation against real-system overheads, the work would usefully shift emphasis from policy sophistication toward parallelism and cluster topology as performance levers under realistic traffic. The reliance on an external, publicly released trace dataset is a clear strength for reproducibility.
major comments (2)
- [Simulation methodology section] Simulation methodology section: the mapping from trace job sizes to service times, the precise definition of the constant computational budget (total capacity versus per-server speed scaling), and the absence of communication, synchronization, or heterogeneity costs are not described or validated. Because the location of the reported minimum and the relative ranking of policies and multi-stage architectures are obtained exclusively from these simulations, any deviation in these modeling choices can alter or eliminate the claimed minimum and the architecture comparisons.
- [Results section] Results section (figures showing response time versus server count): no statistical significance, confidence intervals, or sensitivity analysis is reported for the location of the minimum or for the claimed equivalence between JIQ and size-based policies. Without these, it is impossible to determine whether the minimum is robust or an artifact of the particular trace realization and parameter settings.
minor comments (2)
- [Model section] Notation for the constant computational budget should be introduced explicitly in the model section rather than only in the simulation description.
- [Figures] Figure captions should state the exact number of simulation runs and the random-seed handling used to generate each curve.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our simulation study of dispatching policies using Google cloud traces. The feedback highlights important aspects of methodology description and statistical robustness that we will address in revision.
read point-by-point responses
-
Referee: [Simulation methodology section] Simulation methodology section: the mapping from trace job sizes to service times, the precise definition of the constant computational budget (total capacity versus per-server speed scaling), and the absence of communication, synchronization, or heterogeneity costs are not described or validated. Because the location of the reported minimum and the relative ranking of policies and multi-stage architectures are obtained exclusively from these simulations, any deviation in these modeling choices can alter or eliminate the claimed minimum and the architecture comparisons.
Authors: We agree that these modeling choices must be described explicitly. In the revised manuscript we will expand the Simulation Methodology section to detail: (1) the mapping of trace job sizes to service times (service time = job size / base server speed), (2) the constant computational budget defined as fixed total capacity (per-server speed scaled as 1/N where N is the number of servers), and (3) the modeling assumptions that omit communication, synchronization, and heterogeneity costs, with a brief justification that the focus is on high-level policy and architecture comparisons under the given workload. We will also note that these simplifications are standard in dispatching studies but could be relaxed in future work. revision: yes
-
Referee: [Results section] Results section (figures showing response time versus server count): no statistical significance, confidence intervals, or sensitivity analysis is reported for the location of the minimum or for the claimed equivalence between JIQ and size-based policies. Without these, it is impossible to determine whether the minimum is robust or an artifact of the particular trace realization and parameter settings.
Authors: We accept that the results would benefit from statistical support. In revision we will add confidence intervals obtained from multiple independent simulation runs (different random seeds) for each data point in the figures. We will also include a sensitivity analysis subsection that varies the trace subsample and arrival process parameters to confirm that the location of the minimum and the observed equivalence between Join Idle Queue and size-based policies remain consistent across realizations. revision: yes
Circularity Check
No significant circularity; results from external trace-driven simulations
full rationale
The paper reports simulation outcomes for mean response time minima, policy comparisons, and multi-stage architecture benefits, all driven by workloads derived from Google's public measurement dataset. No equations, fitted parameters, or predictions reduce by construction to the paper's own inputs. No self-citation chains or uniqueness theorems are invoked to justify the central claims. The derivation chain is self-contained against the external traces and discrete-event model, with no self-definitional, fitted-input, or ansatz-smuggling patterns present.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Google's cloud workload dataset provides representative traffic patterns for general cloud computing scenarios
Forward citations
Cited by 2 Pith papers
-
"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture
A two-stage dispatching architecture with an optimized job-size threshold improves mean response time over single-stage classical policies in large clusters.
-
Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads
Simulations on real Google traces show JIQ and LWL dispatching have optimal cluster sizes for fixed utilization while RR worsens monotonically; task decomposition reveals simpler JIQ outperforming size-aware LWL.
Reference graph
Works this paper leans on
-
[1]
R. Srikant and L. Ying, Communication Networks – An Optimization, Control, and Stochastic Networks Perspective . Cambridge, UK: Cam- bridge University Press, 2014
work page 2014
-
[2]
Designing low- complexity heavy-traffic delay-optimal load balancing schemes:theory to algorithms,
X. Zhou, F. Wu, J. Tan, Y . Sun, and N. Shroff, “Designing low- complexity heavy-traffic delay-optimal load balancing schemes:theory to algorithms,” Proc. ACM Meas. Anal. Comput. Syst. , vol. 1, dec 2017
work page 2017
-
[3]
Surprising re- sults on task assignment in server farms with high-variability workloads,
M. Harchol-Balter, A. Scheller-Wolf, and A. R. Young, “Surprising re- sults on task assignment in server farms with high-variability workloads,” SIGMETRICS Perform. Eval. Rev. , vol. 37, p. 287?298, jun 2009
work page 2009
-
[4]
Heavy-Traffic Optimal Size- and State-Aware Dispatching,
R. Xie, I. Grosof, and Z. Scully, “Heavy-Traffic Optimal Size- and State-Aware Dispatching,” Proceedings of the ACM on Measurement and Analysis of Computing Systems , vol. 8, no. 1, pp. 1–36, 2024
work page 2024
-
[5]
M. Tirmazi, A. Barker, N. Deng, M. E. Haque, Z. G. Qin, S. Hand, M. Harchol-Balter, and J. Wilkes, “Borg: the next generation,” in Proceedings of the Fifteenth European Conference on Computer Systems , EuroSys ’20, (New York, NY , USA), Association for Computing Machinery, 2020
work page 2020
-
[6]
Large-scale cluster management at google with borg,
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at google with borg,” in Proceedings of the Tenth European Conference on Computer Systems , EuroSys ’15, (New York, NY , USA), Association for Computing Machinery, 2015
work page 2015
-
[7]
A novel dynamic multi- objective task scheduling optimization based on dueling dqn and per,
A. Chraibi, S. Ben Alla, A. Touhafi, et al. , “A novel dynamic multi- objective task scheduling optimization based on dueling dqn and per,” The Journal of Supercomputing , vol. 79, pp. 21368–21423, 2023
work page 2023
-
[8]
Open problems in queueing theory inspired by datacenter computing,
M. Harchol-Balter, “Open problems in queueing theory inspired by datacenter computing,” Queueing Syst. Theory Appl. , vol. 97, p. 3?37, feb 2021
work page 2021
-
[9]
Optimality of the round-robin routing policy,
Z. Liu and D. Towsley, “Optimality of the round-robin routing policy,” Journal of Applied Probability , vol. 31, no. 2, pp. 466–475, 1994
work page 1994
-
[10]
Optimal load balancing on distributed homo- geneous unreliable processors,
Z. Liu and R. Righter, “Optimal load balancing on distributed homo- geneous unreliable processors,” Operations Research , vol. 46, no. 4, pp. 563–573, 1998
work page 1998
-
[11]
On the optimal assignment of customers to parallel servers,
R. R. Weber, “On the optimal assignment of customers to parallel servers,” Journal of Applied Probability , vol. 15, no. 2, pp. 406–413, 1978
work page 1978
-
[12]
Partial flexibility in routeing and scheduling,
O. T. AKGUN, R. RIGHTER, and R. WOLFF, “Partial flexibility in routeing and scheduling,” Advances in Applied Probability , vol. 45, no. 3, pp. 673–691, 2013
work page 2013
-
[13]
Optimal state-free, size- aware dispatching for heterogeneous m/g/-type systems,
H. Feng, V . Misra, and D. Rubenstein, “Optimal state-free, size- aware dispatching for heterogeneous m/g/-type systems,” Performance Evaluation, vol. 62, no. 1, pp. 475–492, 2005. Performance 2005
work page 2005
-
[14]
Look-ahead energy efficient vm allocation approach for data centers,
˙I. Ça ˘glar and D. T. Altılar, “Look-ahead energy efficient vm allocation approach for data centers,” Journal of Cloud Computing , vol. 11, no. 1, p. 11, 2022
work page 2022
-
[15]
A survey and taxonomy on energy management schemes in cloud data centers,
S. Singh, I. Chana, J. Singh, and A. Agrawal, “A survey and taxonomy on energy management schemes in cloud data centers,” The Journal of Supercomputing, vol. 78, no. 3, pp. 3533–3595, 2022
work page 2022
-
[16]
Job dispatching policies for queueing systems with unknown service rates,
T. Choudhury, G. Joshi, W. Wang, and S. Shakkottai, “Job dispatching policies for queueing systems with unknown service rates,” in Pro- ceedings of the Twenty-Second International Symposium on Theory, Algorithmic F oundations, and Protocol Design for Mobile Networks and Mobile Computing , MobiHoc ’21, (New York, NY , USA), p. 181?190, Association for Comp...
work page 2021
-
[17]
D. Efrosinin, V . Vishnevsky, and N. Stepanova, “Optimal scheduling in general multi-queue system by combining simulation and neural network techniques,” Sensors, vol. 23, no. 12, 2023
work page 2023
-
[18]
Optimal scheduling in the multiserver-job model under heavy traffic,
I. Grosof, Z. Scully, M. Harchol-Balter, and A. Scheller-Wolf, “Optimal scheduling in the multiserver-job model under heavy traffic,”Proceedings of the ACM on Measurement and Analysis of Computing Systems , vol. 6, no. 3, pp. 1–32, 2022
work page 2022
-
[19]
J. Wilkes, “cluster-data/Clusterdata2019.” https://github.com/google/ cluster-data/tree/master, 2019
work page 2019
-
[20]
Data-driven workload generation based on google data center measurements,
M. Yildiz and A. Baiocchi, “Data-driven workload generation based on google data center measurements,” in 2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR) , pp. 143–148, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.