pith. sign in

arxiv: 2212.05155 · v2 · pith:LFA4ZSEHnew · submitted 2022-12-10 · 💻 cs.DC · cs.LG

Cost-aware Duration Prediction for Software Upgrades in Datacenters

Pith reviewed 2026-05-24 09:56 UTC · model grok-4.3

classification 💻 cs.DC cs.LG
keywords software upgradeduration predictiondatacenter maintenancecost-aware modelingscheduling optimizationstraggler mitigationservice level objectives
0
0 comments X

The pith

Acela predicts software upgrade durations in datacenters while accounting for asymmetric misprediction costs to raise scheduling efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software upgrades in large datacenters require accurate predictions of how long each upgrade will take so that they can be scheduled into limited maintenance windows without disrupting services. The paper frames this as an optimization problem with constraints from service-level objectives and introduces Acela to predict durations while weighing the different costs of overestimating versus underestimating the time. Acela also chooses among models and corrects for cases where a few slow servers skew the estimates upward. When deployed on production systems, this approach allows significantly more upgrades to be scheduled and completed within the same windows. A sympathetic reader would care because more efficient upgrades mean servers stay updated with fewer disruptions and higher overall reliability.

Core claim

The central claim is that a duration prediction system which explicitly models asymmetric costs of prediction errors, selects appropriate models, and mitigates straggler-induced overestimations can raise upgrade window utilization by a factor of 1.25, increase the number of scheduled upgrades by 33 percent and completed upgrades by 41 percent, and cut cancellation rates by a factor of 2.4 when evaluated on Meta's production datacenter systems.

What carries the argument

Acela, the cost-aware duration prediction framework that strategically selects predictive models based on misprediction costs and adjusts for stragglers.

If this is right

  • More upgrades can be completed without expanding maintenance windows.
  • Upgrade cancellations due to time overruns decrease substantially.
  • Service level objectives are met more reliably during upgrade periods.
  • The overall throughput of the upgrade scheduler increases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cost-aware selection logic might improve duration predictions for other datacenter tasks such as data migrations or hardware servicing.
  • If the framework generalizes, operators could apply it to new upgrade types with minimal additional tuning.
  • Datacenters with more heterogeneous hardware might require extensions to handle a wider range of straggler behaviors.

Load-bearing premise

That the upgrade characteristics and workloads observed in Meta's datacenters are representative enough that models trained there will achieve similar gains in other environments.

What would settle it

Deploying Acela on upgrade logs from a second large-scale datacenter operator and measuring whether the reported improvements in utilization, throughput, and cancellation rates still appear.

Figures

Figures reproduced from arXiv: 2212.05155 by Aijia Gao, Essam Ewaisha, Henry Hoffmann, Igor Marnat, Michal Sedlak, Thibaud Ryden, Yi Ding.

Figure 1
Figure 1. Figure 1: An illustration of a maintenance cycle that includes [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of differences between quantile regres￾sion (QR) with different quantiles and standard regression (SR) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CDFs of normalized duration. Median and p99 duration are marked by green circles and red stars. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The quantile losses with di [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: displays the overall workflow of Acela. Acela works in an online fashion: it continually collects data as new main￾tenance jobs are executed, and retrains models as new data come in. The input of Acela is the user SLO for model hyper￾parameter tuning, which is at least 95% jobs on the validation set are overpredicted with the highest prediction accuracy in our experiments. Acela includes three components: … view at source ↗
Figure 6
Figure 6. Figure 6: Prediction accuracy in MAPE for each method per firmware. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Overprediction rate (OPR) for each method per firmware. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity analyses of MAPE at different quantiles for Acela. 60 70 80 90 100 q (%) 80 100 OPR (%) CPLD 60 70 80 90 100 q (%) 50 100 OPR (%) FLASH 60 70 80 90 100 q (%) 50 100 OPR (%) BIC 60 70 80 90 100 q (%) 92.5 95.0 97.5 OPR (%) BIOS 60 70 80 90 100 q (%) 80 100 OPR (%) NIC 60 70 80 90 100 q (%) 80 100 OPR (%) OPENBMC [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sensitivity analyses of overprediction rate (OPR) at di [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Software upgrades are critical to maintaining server reliability in datacenters. While job duration prediction and scheduling have been extensively studied, the unique challenges posed by software upgrades remain largely under-explored. This paper presents the first in-depth investigation into software upgrade scheduling at datacenter scale. We begin by characterizing various types of upgrades and then frame the scheduling task as a constrained optimization problem. To address this problem, we introduce Acela, a cost-aware duration prediction framework designed to improve upgrade scheduling efficiency and throughput while meeting service-level objectives (SLOs). Acela accounts for asymmetric misprediction costs, strategically selects the best predictive models, and mitigates straggler-induced overestimations. Evaluations on Meta's production datacenter systems demonstrate that Acela significantly increases efficiency of the existing upgrade scheduler by improving upgrade window utilization by 1.25X, increasing the number of scheduled and completed upgrades by 33% and 41%, and reducing cancellation rates by 2.4X. The code and data sets will be released after paper acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript characterizes software upgrade types in datacenters and frames upgrade scheduling as a constrained optimization problem. It introduces Acela, a cost-aware duration prediction framework that strategically selects models while accounting for asymmetric misprediction costs and mitigating straggler-induced overestimations to meet SLOs. Production evaluations at Meta report that Acela improves existing scheduler efficiency by 1.25X in upgrade window utilization, increases scheduled and completed upgrades by 33% and 41%, and reduces cancellation rates by 2.4X. Code and datasets are promised for release.

Significance. If the empirical gains hold under broader conditions, the work addresses an under-explored scheduling domain with direct operational relevance for large-scale datacenters. The explicit handling of cost asymmetry and stragglers, combined with the commitment to release artifacts, would strengthen reproducibility and enable follow-on studies.

major comments (1)
  1. [Evaluation] Evaluation section: The reported gains (1.25X window utilization, +33%/41% scheduled/completed upgrades, 2.4X lower cancellations) rest exclusively on a single Meta production trace. No cross-site experiments, synthetic workloads, or sensitivity analysis varying upgrade-duration distributions, straggler frequency, or misprediction-cost ratios are presented, so it is unclear whether the constrained optimizer would deliver comparable throughput under different workload traits.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the feedback. The evaluation concern is valid and we address it directly below.

read point-by-point responses
  1. Referee: Evaluation section: The reported gains (1.25X window utilization, +33%/41% scheduled/completed upgrades, 2.4X lower cancellations) rest exclusively on a single Meta production trace. No cross-site experiments, synthetic workloads, or sensitivity analysis varying upgrade-duration distributions, straggler frequency, or misprediction-cost ratios are presented, so it is unclear whether the constrained optimizer would deliver comparable throughput under different workload traits.

    Authors: We agree the evaluation uses a single production trace from Meta. This trace captures the actual upgrade types, straggler patterns, and cost asymmetries observed in a large-scale operational setting, which is the intended deployment environment. Cross-site experiments are not possible because comparable traces from other operators are unavailable due to proprietary constraints. We will, however, add sensitivity analysis in the revision: using the promised dataset release, we will perturb upgrade-duration distributions, straggler rates, and misprediction-cost ratios and re-run the constrained optimizer to quantify throughput variation. This addresses the robustness question while remaining within the scope of the released artifacts. revision: partial

standing simulated objections not resolved
  • Cross-site experiments cannot be conducted because production traces from other datacenter operators are not accessible.

Circularity Check

0 steps flagged

No circularity; empirical evaluation on production traces

full rationale

The paper frames upgrade scheduling as a constrained optimization problem and introduces Acela as a cost-aware prediction framework, but all load-bearing claims are empirical measurements (1.25X utilization, +33/41% upgrades, 2.4X lower cancellations) obtained by running the system on Meta production traces. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text; the reported gains are direct outcomes of deployment rather than reductions of a derivation to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5728 in / 1084 out tokens · 20369 ms · 2026-05-24T09:56:09.233924+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    Evolve or die: High- availability design principles drawn from googles net- work infrastructure

    Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. Evolve or die: High- availability design principles drawn from googles net- work infrastructure. In Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM ’16, page 58–72, New York, NY , USA, 2016. Association for Comput- ing Machinery. ISBN 9781450341936. doi: 10. 1145/2934872...

  2. [2]

    The datacenter as a computer: An introduction to the design of warehouse-scale machines

    Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture, 8(3):1–154, 2013

  3. [3]

    zupdate: Updating data center networks with zero loss

    Hongqiang Harry Liu, Xin Wu, Ming Zhang, Lihua Yuan, Roger Wattenhofer, and David Maltz. zupdate: Updating data center networks with zero loss. In Pro- ceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, pages 411–422, 2013

  4. [4]

    Ex- plicit path control in commodity data centers: Design and applications

    Shuihai Hu, Kai Chen, Haitao Wu, Wei Bai, Chang Lan, Hao Wang, Hongze Zhao, and Chuanxiong Guo. Ex- plicit path control in commodity data centers: Design and applications. In 12th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 15), pages 15–28, 2015

  5. [5]

    Gwin, Sathish K Sekar, Sergey N

    Parthasarathy Ranganathan, Danner Stodolsky, Je ff Calow, Jeremy Dorfman, Marisabel Guevara Hecht- man, Clint Smullen, Aki Kuusela, Aaron James Laursen, Alex Ramirez, Alvin Adrian Wijaya, Amir Salek, Anna Cheung, Ben Gelb, Brian Fosco, Cho Mon Kyaw, Dake He, David Alexander Munday, David Wickeraad, Devin Persaud, Don Stark, Drew Walton, Elisha Indu- palli,...

  6. [6]

    Meet: rack- level pooling based load balancing in datacenter net- works

    Jiaqing Dong, Lijuan Tan, Chen Tian, Yuhang Zhou, Yi Wang, Wanchun Dou, and Guihai Chen. Meet: rack- level pooling based load balancing in datacenter net- works. IEEE Transactions on Parallel and Distributed Systems, 2022

  7. [7]

    Estimating computation times of data-intensive applications

    Shonali Krishnaswamy, Seng Wai Loke, and Arkady Za- slavsky. Estimating computation times of data-intensive applications. IEEE Distributed Systems Online , 5(4), 2004

  8. [8]

    Stratus: Cost-aware container scheduling in the public cloud

    Andrew Chung, Jun Woo Park, and Gregory R Ganger. Stratus: Cost-aware container scheduling in the public cloud. In Proceedings of the ACM symposium on cloud computing, pages 121–134, 2018

  9. [9]

    Reservation-based scheduling: If you’re late don’t blame us! In Proceedings of the ACM Symposium on Cloud Computing, pages 1–14, 2014

    Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. Reservation-based scheduling: If you’re late don’t blame us! In Proceedings of the ACM Symposium on Cloud Computing, pages 1–14, 2014

  10. [10]

    Network-aware scheduling for data-parallel jobs: Plan when you can

    Virajith Jalaparti, Peter Bodik, Ishai Menache, Sriram Rao, Konstantin Makarychev, and Matthew Caesar. Network-aware scheduling for data-parallel jobs: Plan when you can. ACM SIGCOMM Computer Communi- cation Review, 45(4):407–420, 2015

  11. [11]

    Mor- pheus: Towards automated slos for enterprise clusters

    Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. Mor- pheus: Towards automated slos for enterprise clusters. 12 In 12th USENIX Symposium on Operating Systems De- sign and Implementation (OSDI 16) , pages 117...

  12. [12]

    Perforator: eloquent performance mod- els for resource optimization

    Kaushik Rajan, Dharmesh Kakadia, Carlo Curino, and Subru Krishnan. Perforator: eloquent performance mod- els for resource optimization. In Proceedings of the Seventh ACM Symposium on Cloud Computing, pages 415–427, 2016

  13. [13]

    Tetrisched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters

    Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. Tetrisched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Pro- ceedings of the Eleventh European Conference on Com- puter Systems, pages 1–16, 2016

  14. [14]

    Don’t cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling

    C˘alin Iorgulescu, Florin Dinu, Aunn Raza, Wajih Ul Hassan, and Willy Zwaenepoel. Don’t cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 97–109, 2017

  15. [15]

    3sigma: distribution-based cluster scheduling for runtime uncer- tainty

    Jun Woo Park, Alexey Tumanov, Angela Jiang, Michael A Kozuch, and Gregory R Ganger. 3sigma: distribution-based cluster scheduling for runtime uncer- tainty. In Proceedings of the Thirteenth EuroSys Con- ference, pages 1–17, 2018

  16. [16]

    A case for task sampling based learning for cluster job scheduling

    Akshay Jajoo, Y Charlie Hu, Xiaojun Lin, and Nan Deng. A case for task sampling based learning for cluster job scheduling. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 19–33, 2022

  17. [17]

    Quantile regres- sion

    Roger Koenker and Kevin F Hallock. Quantile regres- sion. Journal of economic perspectives, 15(4):143–156, 2001

  18. [18]

    The knapsack problem: a survey

    Harvey M Salkin and Cornelis A De Kluyver. The knapsack problem: a survey. Naval Research Logistics Quarterly, 22(1):127–144, 1975

  19. [19]

    Forecasting, time series, and regression: an applied approach, volume 4

    Bruce L Bowerman, Richard T O’Connell, and Anne B Koehler. Forecasting, time series, and regression: an applied approach, volume 4. South-Western Pub, 2005

  20. [20]

    Cpld, howpublished = https://github.com/ mikeroyal/cpld-guide,

  21. [21]

    com/content/www/us/en/download/17903/ intel-ssd-firmware-update-tool.html ,

    Flash, howpublished = https://www.intel. com/content/www/us/en/download/17903/ intel-ssd-firmware-update-tool.html ,

  22. [22]

    Bic, howpublished = https://github.com/ facebook/openbic,

  23. [23]

    Bios, howpublished = https://github.com/ openbios/openbios,

  24. [24]

    Nic, howpublished = https://github.com/ netronome/nic-firmware,

  25. [25]

    Openbmc, howpublished = https://github.com/ openbmc,

  26. [26]

    Lightgbm: A highly efficient gradient boosting decision tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017

  27. [27]

    Predicting node failure in cloud service systems

    Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Cheng- gang Li, Youjiang Wu, Randolph Yao, et al. Predicting node failure in cloud service systems. In Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the founda- tions of software engineering, pages 480–490, 2018

  28. [28]

    Nurd: Negative-unlabeled learning for online datacenter straggler prediction

    Yi Ding, Avinash Rao, Hyebin Song, Rebecca Willett, and Henry Hank Hoffmann. Nurd: Negative-unlabeled learning for online datacenter straggler prediction. Pro- ceedings of Machine Learning and Systems, 4:190–203, 2022

  29. [29]

    Cpr: Composable performance regres- sion for scalable multiprocessor models

    Benjamin C Lee, Jamison Collins, Hong Wang, and David Brooks. Cpr: Composable performance regres- sion for scalable multiprocessor models. In 2008 41st IEEE/ACM International Symposium on Microarchitec- ture, pages 270–281. IEEE, 2008

  30. [30]

    Generalizable and interpretable learning for configuration extrapolation

    Yi Ding, Ahsan Pervaiz, Michael Carbin, and Henry Hoffmann. Generalizable and interpretable learning for configuration extrapolation. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of soft- ware engineering, pages 728–740, 2021

  31. [31]

    Retail: Opting for learning sim- plicity to enable qos-aware power management in the cloud

    Shuang Chen, Angela Jin, Christina Delimitrou, and José F Martínez. Retail: Opting for learning sim- plicity to enable qos-aware power management in the cloud. In 2022 IEEE International Symposium on High- Performance Computer Architecture (HPCA) , pages 155–168. IEEE, 2022

  32. [32]

    Wrangler: Predictable and faster jobs using fewer resources

    Neeraja J Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. Wrangler: Predictable and faster jobs using fewer resources. In Proceedings of the ACM Symposium on Cloud Computing, pages 1–14, 2014

  33. [33]

    Hypermapper: A practical design space ex- ploration framework

    Luigi Nardi, Artur Souza, David Koeplinger, and Kunle Olukotun. Hypermapper: A practical design space ex- ploration framework. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of 13 Computer and Telecommunication Systems (MASCOTS), pages 425–426. IEEE, 2019

  34. [34]

    Tvm: An au- tomated end-to-end optimizing compiler for deep learn- ing

    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. Tvm: An au- tomated end-to-end optimizing compiler for deep learn- ing. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578–594, 2018

  35. [35]

    Memory cocktail therapy: A general learning-based framework to opti- mize dynamic tradeoffs in nvms

    Zhaoxia Deng, Lunkai Zhang, Nikita Mishra, Henry Hoffmann, and Frederic T Chong. Memory cocktail therapy: A general learning-based framework to opti- mize dynamic tradeoffs in nvms. In Proceedings of the 50th Annual IEEE /ACM International Symposium on Microarchitecture, pages 232–244, 2017

  36. [36]

    E fficiently explor- ing architectural design spaces via predictive model- ing

    Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. E fficiently explor- ing architectural design spaces via predictive model- ing. ACM SIGOPS Operating Systems Review, 40(5): 195–206, 2006

  37. [37]

    Scikit-learn: Machine learning in python

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram- fort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vin- cent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12: 2825–2830, 2011

  38. [38]

    Apollo: Scalable and coordinated scheduling for cloud-scale computing

    Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jin- gren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In 11th USENIX symposium on operating systems design and implementation (OSDI 14), pages 285–300, 2014

  39. [39]

    Ix: A protected dataplane operating system for high throughput and low latency

    Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. Ix: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Sympo- sium on Operating Systems Design and Implementation (OSDI 14), pages 49–65, 2014. doi: 10.1145/2997641

  40. [40]

    Thunderbolt:throughput- optimized,quality-of-service-aware power capping at scale

    Shaohong Li, Xi Wang, Faria Kalim, Xiao Zhang, Sangeetha Abdu Jyothi, Karan Grover, Vasileios Kon- torinis, Nina Narodytska, Owolabi Legunsen, Sreeku- mar Kodakara, et al. Thunderbolt:throughput- optimized,quality-of-service-aware power capping at scale. In 14th USENIX Symposium on Operating Sys- tems Design and Implementation (OSDI 20) , pages 1241–1255, 2020

  41. [41]

    Accurate and effi- cient regression modeling for microarchitectural perfor- mance and power prediction

    Benjamin C Lee and David M Brooks. Accurate and effi- cient regression modeling for microarchitectural perfor- mance and power prediction. ACM SIGOPS operating systems review, 40(5):185–194, 2006

  42. [42]

    Energy-e fficient soft real-time cpu scheduling for mobile multimedia systems

    Wanghong Yuan and Klara Nahrstedt. Energy-e fficient soft real-time cpu scheduling for mobile multimedia systems. ACM SIGOPS Operating Systems Review, 37 (5):149–163, 2003

  43. [43]

    An approach to performance prediction for parallel applications

    Engin Ipek, Bronis R De Supinski, Martin Schulz, and Sally A McKee. An approach to performance prediction for parallel applications. In European Conference on Parallel Processing, pages 196–205. Springer, 2005

  44. [44]

    Perceptron-based prefetch filtering

    Eshan Bhatia, Gino Chacon, Seth Pugsley, Elvira Teran, Paul V Gratz, and Daniel A Jiménez. Perceptron-based prefetch filtering. In2019 ACM/IEEE 46th Annual Inter- national Symposium on Computer Architecture (ISCA), pages 1–13. IEEE, 2019

  45. [45]

    Bit-level perceptron pre- diction for indirect branches

    Elba Garza, Samira Mirbagher-Ajorpaz, Tahsin Ahmad Khan, and Daniel A Jiménez. Bit-level perceptron pre- diction for indirect branches. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architec- ture (ISCA), pages 27–38. IEEE, 2019

  46. [46]

    Applying deep learning to the cache replacement problem

    Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 413–425, 2019

  47. [47]

    Learning scheduling algorithms for data processing clus- ters

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. Learning scheduling algorithms for data processing clus- ters. In Proceedings of the ACM special interest group on data communication, pages 270–288. 2019

  48. [48]

    Paragon: Qos-aware scheduling for heterogeneous datacenters

    Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 48(4):77–88, 2013

  49. [49]

    Caloree: Learning control for predictable latency and low energy

    Nikita Mishra, Connor Imes, John D Lafferty, and Henry Hoffmann. Caloree: Learning control for predictable latency and low energy. ACM SIGPLAN Notices, 53(2): 184–198, 2018

  50. [50]

    Lever- aging deep learning to improve performance predictabil- ity in cloud microservices with seer

    Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. Lever- aging deep learning to improve performance predictabil- ity in cloud microservices with seer. ACM SIGOPS Operating Systems Review, 53(1):34–39, 2019

  51. [51]

    Genera- tive and multi-phase learning for computer systems opti- mization

    Yi Ding, Nikita Mishra, and Henry Ho ffmann. Genera- tive and multi-phase learning for computer systems opti- mization. In Proceedings of the 46th International Sym- posium on Computer Architecture, pages 39–52, 2019. 14