pith. sign in

arxiv: 2606.30533 · v1 · pith:432WI3RFnew · submitted 2026-06-29 · 💻 cs.DC

Spandana: Reconciling Strict SLOs with Low Cost under Fine-Grained Load Fluctuations

Pith reviewed 2026-06-30 03:19 UTC · model grok-4.3

classification 💻 cs.DC
keywords SLOFaaScloud autoscalingload fluctuationshybrid deploymentrequest steeringVM provisioningcost optimization
0
0 comments X

The pith

Spandana decouples SLO enforcement from cost optimization by placing a lightweight controller on each VM to steer individual requests between the VM and FaaS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Spandana as a way to handle sub-second load changes in cloud services without over-provisioning resources to protect response-time targets. It separates the task of meeting strict SLOs, done by a per-VM controller that decides request by request whether the VM can handle it, from the separate task of choosing how many VMs to run for lowest total cost. Requests that would miss the SLO are sent to a standard FaaS platform while the rest stay on the VM, letting the VM pool operate at high utilization. Evaluation against three existing approaches shows the system keeps SLOs, reaches 76-86 percent CPU use, and lowers cost by 5-44 percent. The design works with unmodified FaaS offerings such as AWS Lambda.

Core claim

Spandana addresses the tradeoff by decoupling SLO enforcement from cost optimization. A lightweight controller colocated with each application VM enforces SLOs by steering each arriving request between the VM and FaaS. Requests that can meet the SLO stay on the VM; the remaining requests are forwarded to a stock FaaS layer such as AWS Lambda. For cost optimization, Spandana's resource allocator determines the most-efficient VM provisioning by accounting for VM cost, FaaS cost, and traffic volatility, allowing the VM pool to run at high utilization.

What carries the argument

The lightweight controller colocated with each VM that classifies arriving requests and steers them to the VM if the SLO can be met or to FaaS otherwise.

If this is right

  • VM pools can be sized for high utilization without risking SLO breaches during load spikes.
  • Existing FaaS platforms can be used unchanged as the overflow tier.
  • Cost savings of 5-44 percent are achieved while preserving strict SLO compliance.
  • Resource allocation can explicitly factor in both VM and FaaS pricing plus traffic volatility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same steering idea could be applied to other hybrid execution environments beyond VM-FaaS pairs.
  • Operators might reduce reliance on complex reactive autoscalers if per-application controllers handle local decisions.
  • Workloads with bursty but short peaks become cheaper to run without custom capacity planning.

Load-bearing premise

The colocated controller can classify each request accurately and with negligible added latency so that steering decisions neither violate the SLO themselves nor systematically misroute traffic under real sub-second fluctuations.

What would settle it

Run the system on a production trace with measured sub-second spikes and record whether any request routed to the VM misses its SLO or whether total cost exceeds that of the best baseline.

Figures

Figures reproduced from arXiv: 2606.30533 by Boris Grot, Dilina Dehigama, Marios Kogias, Marton Nemeth, Shengda Zhu, Shyam Jesalpura, Zeyu Xu.

Figure 1
Figure 1. Figure 1: Production workload from Twitter showing fine [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The provisioning trade-off for bursty workloads. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cost minimization strategies. (a) Stable load allows [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Spandana Resource Optimizer (SRO) in action [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Service topology of BookInfo application [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: CPU utilization (higher is better). App Spandana Spandana-C HPA-S HPA-O AutoBurst Libra Ratings 0.05 0.01 0.02 0.80 0.32 0.60 Details 0.34 0.14 0.14 0.54 1.14 1.59 Compr 0.01 0.01 0.02 0.57 0.15 0.07 Img Proc 0.01 0.00 0.02 0.74 0.23 0.20 [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Breakdown of FaaS request volume (a) and cost [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: 1-hour Twitter trace with a sudden load surge. [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

Cloud-based online services face significant sub-second load fluctuations while needing to meet strict Service Level Objectives (SLOs). Cluster operators often over-provision resources to protect SLOs, sacrificing utilization and cost efficiency. Existing reactive and proactive autoscalers, serverless (FaaS) deployments, and VM/FaaS hybrid systems fail to reconcile strict SLO compliance with low cost and high utilization under fine-grained load fluctuation. We introduce Spandana, an architecture that addresses this trade off by decoupling SLO enforcement from cost optimization. A lightweight controller colocated with each application VM enforces SLOs by steering each arriving request between the VM and FaaS. Requests that can meet the SLO stay on the VM; the remaining requests are forwarded to a stock FaaS layer such as AWS Lambda. For cost optimization, Spandana's resource allocator determines the most-efficient VM provisioning by accounting for VM cost, FaaS cost, and traffic volatility, allowing the VM pool to run at high utilization. Our evaluation shows that Spandana maintains strict SLO adherence, achieves 76-86% CPU utilization, and reduces cost by 5-44% over three SOTA baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Spandana, a hybrid VM/FaaS architecture that decouples SLO enforcement from cost optimization. A lightweight controller colocated with each application VM steers individual requests to the VM (if the SLO can be met) or forwards them to FaaS; a separate resource allocator then provisions the VM pool to maximize utilization while accounting for VM/FaaS costs and traffic volatility. Evaluation claims strict SLO adherence, 76-86% CPU utilization, and 5-44% cost reduction versus three SOTA baselines under fine-grained load fluctuations.

Significance. If the controller's per-request decisions can be shown to be both fast and accurate, the architecture would allow cloud operators to run VMs at high utilization without sacrificing strict SLOs, offering a practical middle ground between over-provisioned VMs and pure serverless deployments.

major comments (3)
  1. [§3.2] §3.2 (Controller): the per-request classification rule is presented only at a high level with no pseudocode, threshold equations, or latency analysis; without an explicit decision procedure it is impossible to verify that classification itself does not add latency or systematically misroute requests under sub-second arrival and service-time jitter—the central assumption supporting the SLO guarantee.
  2. [§4] §4 (Evaluation setup): workload generation, fluctuation timescales, and request-size distributions are described only qualitatively; the reported 76-86% utilization and cost numbers therefore cannot be reproduced or stress-tested against the exact volatility regime the paper targets.
  3. [§5] §5 (Baselines and results): the three SOTA baselines are compared without reporting their internal parameter settings or the precise cost model used for the 5-44% savings; this leaves open whether the gains depend on favorable baseline configurations rather than on Spandana's decoupling.
minor comments (2)
  1. Notation for the allocator's cost function is introduced without a consolidated table of symbols.
  2. Figure captions should explicitly state the fluctuation frequency (e.g., requests per second) used in each experiment.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Controller): the per-request classification rule is presented only at a high level with no pseudocode, threshold equations, or latency analysis; without an explicit decision procedure it is impossible to verify that classification itself does not add latency or systematically misroute requests under sub-second arrival and service-time jitter—the central assumption supporting the SLO guarantee.

    Authors: We agree that the controller description in §3.2 is high-level. The classification uses a threshold rule based on the SLO target, estimated service time, and current VM load, but we will add explicit pseudocode, the threshold equations, and a latency breakdown in the revision. Measurements will show classification overhead is under 100μs and does not introduce systematic misrouting under the evaluated jitter levels. revision: yes

  2. Referee: [§4] §4 (Evaluation setup): workload generation, fluctuation timescales, and request-size distributions are described only qualitatively; the reported 76-86% utilization and cost numbers therefore cannot be reproduced or stress-tested against the exact volatility regime the paper targets.

    Authors: We will revise §4 to include quantitative details on the workload generator, exact fluctuation timescales (e.g., burst intervals and amplitudes), and request-size distributions used. This will allow full reproduction of the 76-86% utilization and cost results under the targeted volatility. revision: yes

  3. Referee: [§5] §5 (Baselines and results): the three SOTA baselines are compared without reporting their internal parameter settings or the precise cost model used for the 5-44% savings; this leaves open whether the gains depend on favorable baseline configurations rather than on Spandana's decoupling.

    Authors: We will report the exact parameter settings for each baseline (e.g., scaling thresholds and prediction horizons) as configured per their original papers, and provide the full cost model (VM hourly rates and FaaS per-invocation/duration pricing). This will clarify that the reported savings arise from Spandana's decoupling rather than baseline tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and evaluation lack derivation chain or fitted predictions

full rationale

The paper presents Spandana as an architecture using a colocated lightweight controller to steer requests between VM and FaaS for SLO enforcement, with a separate resource allocator for cost optimization. Evaluation claims (76-86% utilization, 5-44% cost reduction) are presented as empirical outcomes. No equations, fitted parameters, predictions derived from inputs, self-citations as load-bearing premises, or ansatzes appear in the abstract or description. The derivation chain is absent; claims rest on system design and reported measurements rather than any reduction of results to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5769 in / 1163 out tokens · 40040 ms · 2026-06-30T03:19:03.450765+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    [n. d.]. Envoy Proxy. https://www.envoyproxy.io/

  2. [2]

    [n. d.]. Fluent Bit. https://fluentbit.io/

  3. [3]

    AWS Lambda – Container Image Support

    2021. AWS Lambda – Container Image Support. https://aws.amazon.com/blogs/ aws/new-for-aws-lambda-container-image-support/. Accessed 10. Sep. 2025

  4. [4]

    AWS Lambda Web Adapter

    2022. AWS Lambda Web Adapter. https://github.com/awslabs/aws-lambda-web- adapter. Accessed 10. Sep. 2025

  5. [5]

    Serverless Adapter

    2022. Serverless Adapter. https://github.com/H4ad/serverless-adapter. Accessed

  6. [6]

    Archive Team: The Twitter Stream Grab

    2024. Archive Team: The Twitter Stream Grab. https://archive.org/details/ twitterstream. Accessed 9. Jan. 2024

  7. [7]

    BookInfo Application

    2024. "BookInfo Application". https://istio.io/latest/docs/examples/bookinfo/. Accessed 20. May. 2024

  8. [8]

    Horizontal Pod Autoscaler - Kubernetes

    2024. Horizontal Pod Autoscaler - Kubernetes. [Online; accessed 14. Jan. 2024]. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

  9. [9]

    2025 Kubernetes Cost Benchmark Report

    2025. 2025 Kubernetes Cost Benchmark Report. https://cast.ai/kubernetes-cost- benchmark/. Accessed 19. Aug. 2025

  10. [10]

    Discovering Services

    2025. Discovering Services. https://kubernetes.io/docs/concepts/services- networking/service/#discovering-services. Accessed: 18.Sept.2025

  11. [11]

    Horizontal Pod Autoscaling

    2025. Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run- application/horizontal-pod-autoscale/. Accessed: 18.Sept.2025

  12. [12]

    Services, Load Balancing, and Networking

    2025. Services, Load Balancing, and Networking. https://kubernetes.io/docs/ concepts/services-networking/

  13. [13]

    Amazon Web Services. 2025. Burstable Performance Instances and CPU Cred- its. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits- baseline-concepts.html. Accessed: 18.Sept.2025

  14. [14]

    Ataollah Fatahi Baarzi, Timothy Zhu, and Bhuvan Urgaonkar. 2019. BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud. In Proceedings of the ACM Symposium on Cloud Computing(Santa Cruz, CA, USA) (SoCC ’19). Association for Computing Machinery, New York, NY, USA, 126–138. doi:10.1145/3357223.3362706

  15. [15]

    Haoqiong Bian, Tiannan Sha, and Anastasia Ailamaki. 2023. Using Cloud Func- tions as Accelerator for Elastic Data Analytics.Proc. ACM Manag. Data1, 2, Article 161 (jun 2023), 27 pages. doi:10.1145/3589306

  16. [16]

    Franklin, Michael I

    Peter Bodik, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2010. Characterizing, modeling, and generating workload spikes for stateful services. InProceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA)(SoCC ’10). Association for Computing Machinery, New York, NY, USA, 241–252. doi:10.1145/180712...

  17. [17]

    Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: scalable and coordinated scheduling for cloud-scale computing. InProceedings of the 11th USENIX Conference on Oper- ating Systems Design and Implementation(Broomfield, CO)(OSDI’14). USENIX Association, USA, 285–300

  18. [18]

    Jiadong Chen, Xiao He, Hengyu Ye, Fuxin Jiang, Tieying Zhang, Jianjun Chen, and Xiaofeng Gao. 2025. Online ensemble transformer for accurate cloud workload forecasting in predictive auto-scaling.arXiv preprint arXiv:2508.12773(Aug. 2025). arXiv:2508.12773 [cs.LG]

  19. [19]

    Jiagan Cheng, Yilong Zhao, Zijun Li, Quan Chen, Weihao Cui, and Minyi Guo

  20. [20]

    In2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)

    Microless: Cost-efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless. In2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2303–2310

  21. [21]

    2022.Service Meshes Are on the Rise — But Greater Understanding and Experience Are Required

    Cloud Native Computing Foundation. 2022.Service Meshes Are on the Rise — But Greater Understanding and Experience Are Required. Technical Report. Cloud Native Computing Foundation. Accessed: 2025-09-18. https://www.cncf.io/wp- content/uploads/2022/05/CNCF_Service_Mesh_MicroSurvey_Final.pdf

  22. [22]

    Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, and Torsten Hoefler. 2021. SeBS: a serverless benchmark suite for function-as-a- service computing. InProceedings of the 22nd International Middleware Conference (Québec city, Canada)(Middleware ’21). Association for Computing Machinery, New York, NY, USA, 64–78. doi:10.1145/3464298.3476133

  23. [23]

    Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. In Proceedings of the 26th Symposium on Operating Systems Principles(Shanghai, China)(SOSP ’17). Association for Computing Machinery, N...

  24. [24]

    Jaime Dantas, Hamzeh Khazaei, and Marin Litoiu. 2021. BIAS Autoscaler: Lever- aging Burstable Instances for Cost-Effective Autoscaling on Cloud Systems. In Proceedings of the Seventh International Workshop on Serverless Computing (WoSC7) 2021(Virtual Event, Canada)(WoSC ’21). Association for Computing Machinery, New York, NY, USA, 9–16. doi:10.1145/349365...

  25. [25]

    Datadog. 2025. Container Report. https://www.datadoghq.com/container-report/. Accessed: 18.Sept.2025

  26. [26]

    Dilina Dehigama, Shyam Jesalpura, Antonios Katsarakis, Marios Kogias, Rakesh Kumar, and Boris Grot. 2024. Composing microservices and serverless for load resilience. 1–8. The 2nd Workshop on SErverless Systems, Applications and MEthodologies, SESAME 2024 ; Conference date: 22-04-2024 Through 22-04-2024. https://sesame2024.github.io/

  27. [27]

    Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca

    Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: guaranteed job latency in data parallel clusters. InProceed- ings of the 7th ACM European Conference on Computer Systems(Bern, Switzerland) (EuroSys ’12). Association for Computing Machinery, New York, NY, USA, 99–112. doi:10.1145/2168836.2168847

  28. [28]

    Gallager

    Robert G. Gallager. 2013.Stochastic Processes: Theory for Applications. Cambridge University Press. https://books.google.co.uk/books?id=ERLrAQAAQBAJ

  29. [29]

    Anshul Gandhi, Mor Harchol-Balter, Ram Raghunathan, and Michael A. Kozuch

  30. [30]

    AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers.ACM Trans. Comput. Syst.30, 4, Article 14 (Nov. 2012), 26 pages. doi:10. 1145/2382553.2382556

  31. [31]

    Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. 2011. Dominant resource fairness: fair allocation of multiple resource types. InProceedings of the 8th USENIX Conference on Networked Systems Design and Implementation(Boston, MA)(NSDI’11). USENIX Association, USA, 323–336

  32. [32]

    Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. 2016. Firmament: fast, centralized cluster scheduling at scale. InProceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA)(OSDI’16). USENIX Association, USA, 99–115

  33. [33]

    Google Cloud. 2025. Scalable Apps and Autoscaling in Google Kubernetes Engine. https://cloud.google.com/kubernetes-engine/docs/learn/scalable-apps- autoscale. Accessed: 18.Sept.2025

  34. [34]

    2025.Grafana Loki: Like Prometheus, but for logs

    Grafana Labs. 2025.Grafana Loki: Like Prometheus, but for logs. Grafana Labs

  35. [35]

    Grafana Labs. 2025. k6 Documentation. https://grafana.com/docs/k6/latest/

  36. [36]

    Jashwant Raj Gunasekaran, Prashanth Thinakaran, Mahmut Taylan Kandemir, Bhuvan Urgaonkar, George Kesidis, and Chita Das. 2019. Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public Cloud. In2019 IEEE 12th International Conference on Cloud Computing (CLOUD). 199–208. doi:10.1109/CLOUD.2019.00043

  37. [37]

    Rubaba Hasan, Timothy Zhu, and Bhuvan Urgaonkar. 2024. AutoBurst: Au- toscaling Burstable Instances for Cost-effective Latency SLOs. InProceedings of the 2024 ACM Symposium on Cloud Computing(Redmond, WA, USA)(SoCC ’24). Association for Computing Machinery, New York, NY, USA, 243–258. doi:10.1145/3698038.3698530

  38. [38]

    Islam, and Kishwar Ahmed

    Md Rajib Hossen, Mohammad A. Islam, and Kishwar Ahmed. 2022. Practical Efficient Microservice Autoscaling with QoS Assurance. InProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (Minneapolis, MN, USA)(HPDC ’22). Association for Computing Machinery, New York, NY, USA, 240–252. doi:10.1145/3502181.3531460

  39. [39]

    Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: fair scheduling for distributed computing clusters. InProceedings of the ACM SIGOPS 22nd Symposium on Operating Sys- tems Principles(Big Sky, Montana, USA)(SOSP ’09). Association for Computing Machinery, New York, NY, USA, 261–276. doi:10.1145/1629...

  40. [40]

    Baarzi, George Kesidis, Bhuvan Urgaonkar, Nader Alfares, and Mahmut Kandemir

    Aman Jain, Ata F. Baarzi, George Kesidis, Bhuvan Urgaonkar, Nader Alfares, and Mahmut Kandemir. 2020. SplitServe: Efficiently Splitting Apache Spark Jobs Across FaaS and IaaS. InProceedings of the 21st International Middleware Conference(Delft, Netherlands)(Middleware ’20). Association for Computing Machinery, New York, NY, USA, 236–250. doi:10.1145/34232...

  41. [41]

    Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayana- murthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: towards auto- mated SLOs for enterprise clusters. InProceedings of the 12th USENIX Conference on Operating Systems Design and Implementation...

  42. [42]

    Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks. InProceedings of the Fourteenth EuroSys Conference 2019(Dresden, Germany)(EuroSys ’19). Association for Computing Machinery, New York, NY, USA, Article 34, 16 pages. doi:10.1...

  43. [43]

    Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The Power of Prediction: Microservice Auto Scaling via Workload Learning. InProceedings of the 13th Symposium on Cloud Computing (San Francisco, California)(SoCC ’22). Association for Computing Machinery, New York, NY, USA, 355–369. doi:10.1145/3542929.3563477

  44. [44]

    Ming Mao and Marty Humphrey. 2012. A Performance Study on the VM Startup Time in the Cloud. In2012 IEEE Fifth International Conference on Cloud Computing. 423–430. doi:10.1109/CLOUD.2012.103

  45. [45]

    Ziming Mao, Tian Xia, Zhanghao Wu, Wei-Lin Chiang, Tyler Griggs, Romil Bhardwaj, Zongheng Yang, Scott Shenker, and Ion Stoica. 2025. SkyServe: Serving AI Models across Regions and Clouds with Spot Instances. InProceedings of the Twentieth European Conference on Computer Systems(Rotterdam, Netherlands) (EuroSys ’25). Association for Computing Machinery, Ne...

  46. [46]

    Maxday. 2025. Lambda Performance Benchmark. https://maxday.github.io/ lambda-perf/. Accessed: 18.Sept.2025

  47. [47]

    Chunyang Meng, Haogang Tong, Tianyang Wu, Maolin Pan, Yang Yu, and Yi Jiang. 2024. BASE: Burst-adaptive autoscaling via stacked ensembles for SLO assurance and cost efficiency.arXiv preprint arXiv:2402.12962(Feb. 2024). arXiv:2402.12962 [cs.SE]

  48. [48]

    Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, and Zhihao Jia. 2024. SpotServe: Serving Generative Large Language Models on Preemptible Instances. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(La Jolla, CA, USA)(ASPLOS ’24). Association for ...

  49. [49]

    Ingo Müller, Renato Marroquín, and Gustavo Alonso. 2020. Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. InProceed- ings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 115–130. doi:10.1145/3318464.3389758

  50. [50]

    Novak, Sneha Kumar Kasera, and Ryan Stutsman

    Joe H. Novak, Sneha Kumar Kasera, and Ryan Stutsman. 2019. Cloud Functions for Fast and Robust Resource Auto-Scaling. In2019 11th International Conference on Communication Systems & Networks (COMSNETS). 133–140. doi:10.1109/ COMSNETS.2019.8711058

  51. [51]

    Shin, Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal, and Arif Merchant

    Pradeep Padala, Kai-Yuan Hou, Kang G. Shin, Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal, and Arif Merchant. 2009. Automated control of multiple virtualized resources. InProceedings of the 4th ACM European Conference on Com- puter Systems(Nuremberg, Germany)(EuroSys ’09). Association for Computing Machinery, New York, NY, USA, 13–26. doi:10.114...

  52. [52]

    Kozuch, and Gre- gory R

    Jun Woo Park, Alexey Tumanov, Angela Jiang, Michael A. Kozuch, and Gre- gory R. Ganger. 2018. 3Sigma: distribution-based cluster scheduling for runtime uncertainty. InProceedings of the Thirteenth EuroSys Conference(Porto, Portugal) (EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 2, 17 pages. doi:10.1145/3190508.3190515

  53. [53]

    Matthew Perron, Raul Castro Fernandez, David DeWitt, Michael Cafarella, and Samuel Madden. 2023. Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools.Proc. ACM Manag. Data1, 4, Article 233 (dec 2023), 25 pages. doi:10.1145/3626720

  54. [54]

    Satya Nagamani Pothu and Swathi Kailasam. 2025. Hybrid workload prediction for improved autoscaling in IaaS clouds: An ARIMA-OLSTM approach.Ing. Syst. D Inf.30, 04 (April 2025), 961–970

  55. [55]

    Banerjee, Saurabh Jha, Zbigniew T

    Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravis- hankar K. Iyer. 2020. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In14th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 20). USENIX Association, 805–825. https://www.usenix.org/conference/osdi20/presen...

  56. [56]

    Kalbarczyk, Tamer Başar, and Ravishankar K

    Haoran Qiu, Weichao Mao, Chen Wang, Hubertus Franke, Alaa Youssef, Zbig- niew T. Kalbarczyk, Tamer Başar, and Ravishankar K. Iyer. 2023. AWARE: Auto- mate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems. In2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 387–402. https://www.usenix.org/c...

  57. [57]

    Ali Raza, Zongshun Zhang, Nabeel Akhtar, Vatche Isahagian, and Ibrahim Matta

  58. [58]

    In2021 IEEE International Conference on Cloud Engineering (IC2E)

    LIBRA: An Economical Hybrid Approach for Cloud Applications with Strict SLAs. In2021 IEEE International Conference on Cloud Engineering (IC2E). 136–146. doi:10.1109/IC2E52221.2021.00028

  59. [59]

    Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bianchini. 2025. Coach: Exploiting Temporal Patterns for All-Resource Oversubscription i...

  60. [60]

    Nilabja Roy, Abhishek Dubey, and Aniruddha Gokhale. 2011. Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting. InProceedings of the IEEE International Conference on Cloud Computing. IEEE, 159–166

  61. [61]

    Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemys- law Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. 2020. Autopilot: workload autoscaling at Google. InProceedings of the Fifteenth European Conference on Computer Systems(Her- aklion, Greece)(EuroSys ’20). Association for Com...

  62. [62]

    Ghazal Sadeghian, Mohamed Elsakhawy, Mohanna Shahrad, Joe Hattori, and Mohammad Shahrad. 2023. UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms. InProceedings of the 2023 USENIX Annual Technical Conference. USENIX Association, Boston, MA, USA. https://www. usenix.org/conference/atc23/presentation/sadeghian

  63. [63]

    Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 205...

  64. [64]

    Danfeng Shan, Fengyuan Ren, Peng Cheng, and Ran Shu. 2016. Micro- burst in Data Centers: Observations, Implications, and Applications. arXiv:1604.07621 [cs.NI] https://arxiv.org/abs/1604.07621

  65. [65]

    Prateek Sharma, David Irwin, and Prashant Shenoy. 2017. Portfolio-driven Resource Management for Transient Cloud Servers.Proc. ACM Meas. Anal. Comput. Syst.1, 1, Article 5 (June 2017), 23 pages. doi:10.1145/3084442

  66. [66]

    Won Wook Song, Taegeon Um, Sameh Elnikety, Myeongjae Jeon, and Byung- Gon Chun. 2023. Sponge: Fast Reactive Scaling for Stream Processing with Serverless Frameworks. In2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 301–314. https://www.usenix.org/ conference/atc23/presentation/song

  67. [67]

    Dmitrii Ustiugov, Theodor Amariucai, and Boris Grot. 2021. Analyzing Tail Latency in Serverless Clouds with STeLLAR. InProceedings of the 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE

  68. [68]

    Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, and Francis Y. Yan

  69. [69]

    InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(Santa Clara, CA, USA)(NSDI’24)

    Autothrottle: a practical bi-level approach to resource management for SLO-targeted microservices. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(Santa Clara, CA, USA)(NSDI’24). USENIX Association, USA, Article 9, 17 pages

  70. [70]

    WikiBench Project. 2025. WikiBench: A Distributed Wikipedia Access Bench- mark. http://www.wikibench.eu/?page_id=60. Accessed: 18.Sept.2025

  71. [71]

    Zhanghao Wu, Wei-Lin Chiang, Ziming Mao, Zongheng Yang, Eric Friedman, Scott Shenker, and Ion Stoica. 2024. Can’t be late: optimizing spot instance savings under deadlines. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(Santa Clara, CA, USA)(NSDI’24). USENIX Association, USA, Article 11, 19 pages

  72. [72]

    Rumble, and Aaron Archer

    Bartek Wydrowski, Robert Kleinberg, Stephen M. Rumble, and Aaron Archer

  73. [73]

    In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)

    Load is not what you should balance: Introducing Prequal. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 1285–1299. https://www.usenix.org/conference/ nsdi24/presentation/wydrowski

  74. [74]

    Fangkai Yang, Lu Wang, Zhenyu Xu, Jue Zhang, Liqun Li, Bo Qiao, Camille Cou- turier, Chetan Bansal, Soumya Ram, Si Qin, Zhen Ma, Íñigo Goiri, Eli Cortez, Terry Yang, Victor Rühle, Saravan Rajmohan, Qingwei Lin, and Dongmei Zhang. 2023. Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs. InProceedings of the 28th ACM Internationa...

  75. [75]

    Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: ex- ploiting cloud services for cost-effective, SLO-aware machine learning inference serving. InProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference(Renton, WA, USA)(USENIX ATC ’19). USENIX Association, USA, 1049–1062

  76. [76]

    Edward Suh, and Christina Delimitrou

    Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-based and QoS-aware resource management for cloud microservices. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems(Virtual, USA)(ASPLOS ’21). Association for Computing Machinery,...

  77. [77]

    Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, and Christina Delimitrou

  78. [78]

    arXiv:2401.02920 [cs.DC] https://arxiv.org/abs/2401.02920

    Analytically-Driven Resource Management for Cloud-Native Microservices. arXiv:2401.02920 [cs.DC] https://arxiv.org/abs/2401.02920

  79. [79]

    Ziming Zhao, Mingyu Wu, Jiawei Tang, Binyu Zang, Zhaoguo Wang, and Haibo Chen. 2023. BeeHive: Sub-second Elasticity for Web Services with Semi-FaaS Ex- ecution. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(Vancouver, BC, Canada)(ASPLOS 2023). Association for Compu...

  80. [80]

    Xiangfeng Zhu, Guozhen She, Bowen Xue, Yu Zhang, Yongsu Zhang, Xuan Kelvin Zou, XiongChun Duan, Peng He, Arvind Krishnamurthy, Matthew Lentz, Danyang Zhuo, and Ratul Mahajan. 2023. Dissecting Overheads of Service Mesh Sidecars. InProceedings of the 2023 ACM Symposium on Cloud Computing (Santa Cruz, CA, USA)(SoCC ’23). Association for Computing Machinery, ...