Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes

Antonios Katsarakis; Boris Grot; David Schall; Dilina Dehigama; Marios Kogias; Rakesh Kumar; Shyam Jesalpura

arxiv: 2605.23707 · v1 · pith:OWKP37DLnew · submitted 2026-05-22 · 💻 cs.DC

Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes

Dilina Dehigama , Shyam Jesalpura , David Schall , Antonios Katsarakis , Marios Kogias , Rakesh Kumar , Boris Grot This is my paper

Pith reviewed 2026-05-25 02:52 UTC · model grok-4.3

classification 💻 cs.DC

keywords microservicesserverless computingload spikeshybrid deploymentelastic scalingcost optimizationVM provisioning

0 comments

The pith

Flare combines VMs for steady microservice loads with serverless to handle only excess spike traffic from overloaded services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a hybrid architecture that keeps virtual machines running for normal operation because they are cheaper at steady state. When traffic spikes, the system identifies exactly which microservices are overloaded and moves only the surplus requests for those services onto serverless functions. This selective hand-off avoids paying for extra VMs that would sit idle most of the time. The design requires only small changes to the existing control plane and leaves the application code untouched.

Core claim

Flare is a hybrid microservice architecture that utilizes VMs to cost-effectively handle steady workloads and leverages serverless elasticity to absorb traffic spikes by detecting which specific service or services are overloaded and shifting only the excess load of those services to serverless, thereby minimizing cost overhead while requiring minimal changes to the control plane and no modifications to the application.

What carries the argument

The selective load-shifting mechanism that detects overloaded microservices and redirects only their excess traffic to serverless instances.

If this is right

Providers avoid the expense of keeping extra VMs idle between spikes.
Only the overloaded services incur serverless charges rather than the entire chain.
Existing auto-scaling setups can adopt the approach with limited control-plane changes.
Application responsiveness stays high during spikes without code changes.
Cost savings scale with the duration and intensity of the spike rather than with peak capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selective hand-off idea could be applied to other bursty workloads beyond microservices.
If the detection logic proves reliable, it might reduce the need for conservative over-provisioning policies in general.
Real-world traces with varying spike shapes would test whether the cost advantage holds when spikes are short or frequent.
Integration points with different serverless runtimes could surface hidden compatibility costs not visible in the current design.

Load-bearing premise

That excess load from specific microservices can be handed off to serverless without breaking request chains or adding noticeable latency.

What would settle it

A controlled experiment that measures total cost and tail latency for the same spike pattern under Flare versus a VM-only deployment that over-provisions enough capacity in advance.

Figures

Figures reproduced from arXiv: 2605.23707 by Antonios Katsarakis, Boris Grot, David Schall, Dilina Dehigama, Marios Kogias, Rakesh Kumar, Shyam Jesalpura.

**Figure 2.** Figure 2: Impact of a sudden spike in load on a VM based microservice [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Load prediction for two days with unexpected load spikes [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of a sudden load spike on tail latency (P95) on a entirely [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: focuses on the case of a single scalable microservice and provides a high-level depiction of Flare in action. It assumes Flare is deployed on top of K8s, hence it reuses its existing monitoring infrastructure. Microservices run in pods, while an external load balancer, e.g. Envoy or AWS’s Application Load Balancer [46, 47] steers incoming traffic. The Flare Controller runs as a microservice within the clu… view at source ↗

**Figure 6.** Figure 6: Latency comparison. Red dashed line marks the 400ms SLO. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Cost comparison Hotel Reservation, respectively). Similarly, Trace B’s increases stand at 2.5%, 2.6%, and 6.1% (3.8% on average) for the same applications. This indicates that Flare can effectively absorb load spikes with very modest cost impact (less than 4.1% on average). C. Evaluation on AWS Lambda We evaluate Flare’s effectiveness in handling load spikes described in Section VI using a popular producti… view at source ↗

**Figure 9.** Figure 9: Impact of a node failure on tail latency [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Online services strive to maintain application responsiveness even when the traffic is unpredictable and fluctuating. Today's online services are commonly deployed as chains of microservices, each microservice packaged as one or more containers inside virtual machines (VMs). While performant and affordable when the load is steady, VM-based deployments are known to be slow to scale when the load spikes, resulting in degraded performance for end-users of the service. To avoid such performance degradations, service providers can over-provision their deployments; however, such a strategy is costly and inefficient, leaving resources under-utilized for extended periods. To address the challenge of unpredictable load spikes, we propose Flare, a hybrid microservice architecture that combines VMs with serverless computing. Flare utilizes VMs to cost-effectively handle steady workloads and leverages serverless elasticity to absorb traffic spikes. When a spike occurs, Flare detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless, thus minimizing the cost overhead. Flare seamlessly integrates into existing auto-scaling and serverless infrastructure, requiring minimal changes to the control plane and no modifications to the application.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flare sketches selective serverless offloading for microservice spikes but the abstract supplies no mechanisms, data, or evaluation to support the integration claims.

read the letter

The main takeaway is that Flare combines VMs for steady microservice loads with serverless to absorb spikes by detecting and offloading only the overloaded services. This targets the real issue of slow VM scaling under unpredictable traffic without the cost of constant over-provisioning or full serverless use. The selective aspect is presented as a way to limit overhead compared to broader approaches. The paper frames the cost-responsiveness trade-off in containerized chains clearly enough. That part is straightforward and points to a practical pain point in cloud deployments. The selective offloading concept is a reasonable distinction from blanket scaling strategies. The soft spots are the lack of any supporting detail. The abstract asserts seamless integration with existing auto-scaling, minimal control-plane changes, and zero application modifications, yet provides no description of detection heuristics, request routing, or how redirection works for arbitrary containerized services. This leaves the central claim about transparent excess-load shifting unsupported. The stress-test note correctly flags that the redirection mechanism for microservices is not explained, which makes the no-changes claim hard to assess. No experiments, latency numbers, cost comparisons, or citations to prior hybrid systems appear in the given text. The work reads as an early architecture proposal rather than a completed system with evidence. Readers working on cloud elasticity and auto-scaling might find the high-level idea useful as a prompt for further design. Anyone expecting validated results or reproducible mechanisms will not get much from it. The paper deserves serious peer review because the underlying problem is relevant to production systems and the selective-offload direction is plausible, even though the current version needs concrete implementation details and measurements before it can be evaluated properly. I would send it to referees with a request for those additions.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes Flare, a hybrid microservice architecture combining VMs for steady workloads with serverless functions to absorb traffic spikes. It claims that Flare detects specific overloaded services and shifts only their excess load to serverless, while integrating seamlessly into existing auto-scaling and serverless infrastructure with minimal control-plane changes and no application modifications.

Significance. If the proposed mechanisms and integration claims hold, Flare could provide a practical, cost-efficient solution for handling unpredictable loads in containerized microservice deployments without the inefficiencies of over-provisioning. The hybrid approach addresses a real operational challenge in cloud systems. However, the absence of any implementation details, mechanisms, or evaluation data leaves the significance speculative.

major comments (2)

[Abstract] Abstract: The central claim that Flare 'detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless' while requiring 'no modifications to the application' is asserted without any description of a request routing layer, detection heuristic, or service interface assumptions that would enable transparent redirection for arbitrary containerized microservices.
[Abstract] Abstract: No experimental results, implementation details, cost measurements, or performance data are provided to support the claims of minimized cost overhead or maintained responsiveness during spikes, leaving the core performance and integration assertions unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. We agree that the current manuscript version presents the Flare architecture at a high level and that the abstract's claims require supporting descriptions and evidence. We will revise the manuscript to address these points by expanding the mechanisms and adding evaluation data.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that Flare 'detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless' while requiring 'no modifications to the application' is asserted without any description of a request routing layer, detection heuristic, or service interface assumptions that would enable transparent redirection for arbitrary containerized microservices.

Authors: We agree that the abstract makes these assertions without accompanying detail in the current text. The revised manuscript will add a dedicated section describing the request routing layer (including how it intercepts and redirects traffic selectively), the overload detection heuristic (based on per-service metrics from existing auto-scaling infrastructure), and the interface assumptions that permit transparent redirection for standard containerized microservices without application changes. revision: yes
Referee: [Abstract] Abstract: No experimental results, implementation details, cost measurements, or performance data are provided to support the claims of minimized cost overhead or maintained responsiveness during spikes, leaving the core performance and integration assertions unsupported.

Authors: We agree that the current manuscript contains no implementation details, cost measurements, or performance data. The revised version will include a prototype implementation description, integration with existing auto-scaling and serverless platforms, and evaluation results (including cost and latency measurements under synthetic and real-world spike workloads) to substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture proposal with no derivations or fitted claims

full rationale

The paper is a high-level system architecture proposal describing a hybrid VM+serverless design for handling load spikes. It contains no equations, no quantitative models, no fitted parameters, and no derivation chain that could reduce a prediction or result to its inputs by construction. Central claims (e.g., detection of overloaded services and transparent redirection) are presented as design properties rather than outputs of any self-referential computation or self-citation load-bearing argument. No enumerated circularity patterns apply; the work is self-contained as an engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available so ledger is minimal; proposal rests on domain assumptions about serverless elasticity and integration ease rather than new parameters or entities.

axioms (2)

domain assumption Serverless functions can absorb excess microservice load with negligible overhead and high elasticity
Central to the cost and responsiveness claims in the abstract.
domain assumption Existing auto-scaling and serverless platforms allow seamless integration with minimal control-plane changes
Stated directly as a requirement for practicality.

pith-pipeline@v0.9.0 · 5758 in / 1224 out tokens · 18358 ms · 2026-05-25T02:52:38.101474+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages

[1]

Characterizing microservice dependency and performance: Alibaba trace analysis,

S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 412–426. [Online]. Available: https://doi.org/10.1145/3472883.3487003

work page doi:10.1145/3472883.3487003 2021
[2]

The power of prediction: Microservice auto scaling via workload learning,

S. Luo, H. Xu, K. Ye, G. Xu, L. Zhang, G. Yang, and C. Xu, “The power of prediction: Microservice auto scaling via workload learning,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 355–369. [Online]. Available: https://doi.org/10.1145/3542929.3563477

work page doi:10.1145/3542929.3563477 2022
[3]

Archive Team: The Twitter Stream Grab,

“Archive Team: The Twitter Stream Grab,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://archive.org/details/twitterstream

work page 2024
[4]

FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,

H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp. 805–825. [Online]. Available: https://www.usenix.org/conference/ osdi20/pres...

work page 2020
[5]

”Fascinating facts about facades at CBS Sports

“”Fascinating facts about facades at CBS Sports”,” Dec. 2022, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://www.gomomento.com/blog/ fascinating-facts-about-facades-at-cbs-sports

work page 2022
[6]

”How Netflix Ensures Highly-Reliable Online Stateful Systems

“”How Netflix Ensures Highly-Reliable Online Stateful Systems”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ articles/netflix-highly-reliable-stateful-systems/

work page 2022
[7]

Unity — Asset Store,

“Unity — Asset Store,” Jan. 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https://assetstore.unity. com/

work page 2024
[8]

Store Server Overloaded Resulting in Missed Flash Sales Purchases,

“Store Server Overloaded Resulting in Missed Flash Sales Purchases,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://forum.unity.com/threads/ store-server-overloaded-resulting-in-missed-flash-sales-purchases. 1265966

work page 2024
[9]

Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,

Z. Wang, S. Zhu, J. Li, W. Jiang, K. K. Ramakrishnan, Y . Zheng, M. Yan, X. Zhang, and A. X. Liu, “Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 16–30. [Online]. Availab...

work page doi:10.1145/3542929.3563469 2022
[10]

Lessons learned from migrating complex stateful applications onto serverless platforms,

Z. Jin, Y . Zhu, J. Zhu, D. Yu, C. Li, R. Chen, I. E. Akkus, and Y . Xu, “Lessons learned from migrating complex stateful applications onto serverless platforms,” inProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems, ser. APSys ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 89–96. [Online]. Available: https: //doi....

work page doi:10.1145/3476886.3477510 2021
[11]

Splitserve: Efficiently splitting apache spark jobs across faas and iaas,

A. Jain, A. F. Baarzi, G. Kesidis, B. Urgaonkar, N. Alfares, and M. Kandemir, “Splitserve: Efficiently splitting apache spark jobs across faas and iaas,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 236–250. [Online]. Available: https://doi.org/10.1145...

work page arXiv 2020
[12]

Cackle: Analytical workload cost and performance stability with elastic pools,

M. Perron, R. Castro Fernandez, D. DeWitt, M. Cafarella, and S. Madden, “Cackle: Analytical workload cost and performance stability with elastic pools,”Proc. ACM Manag. Data, vol. 1, no. 4, dec 2023. [Online]. Available: https://doi.org/10.1145/3626720

work page doi:10.1145/3626720 2023
[13]

Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,

C. Zhang, M. Yu, W. Wang, and F. Yan, “Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,” inProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’19. USA: USENIX Association, 2019, p. 1049–1062

work page 2019
[14]

Sora: A latency sensitive approach for microservice soft resource adaptation,

J. Liu, Q. Wang, S. Zhang, L. Hu, and D. Da Silva, “Sora: A latency sensitive approach for microservice soft resource adaptation,” inProceedings of the 24th International Middleware Conference, ser. Middleware ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 43–56. [Online]. Available: https: //doi.org/10.1145/3590140.3592851

work page doi:10.1145/3590140.3592851 2023
[15]

Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,

R. Buyya, S. K. Garg, and R. N. Calheiros, “Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,” in2011 International Con- ference on Cloud and Service Computing, 2011, pp. 1– 10

work page 2011
[16]

Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,

T. Currie, “Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,” https://thenewstack. io/airbnbs-10-takeaways-moving-microservices/, [Accessed 08-01-2024]

work page 2024
[17]

Microservices at netflix: Lessons for architectural design,

W. Team, “Microservices at netflix: Lessons for architectural design,” Jan 2023. [Online]. Available: https://www.nginx.com/blog/ microservices-at-netflix-architectural-best-practices/

work page 2023
[18]

Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts

“Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts.” [Online]. Available: https://www.linkedin.com/blog/engineering/archive/ q-a-with-jim-brikman-splitting-up-a-codebase-into-microservices

work page
[19]

The Opportunities Microservices Provide at Uber Engineering,

“The Opportunities Microservices Provide at Uber Engineering,” Apr. 2016, [Online; accessed 8. Jan. 2024]. [Online]. Available: https://www.uber.com/en-GB/blog/ building-tincup-microservice-implementation

work page 2016
[20]

Rebuilding twitter’s public api,

J. Q. Hylbert and S. Cosenza, “Rebuilding twitter’s public api,” 12 August 2020, [On- line; accessed 8. Jan. 2024]. [Online]. Avail- able: https://blog.twitter.com/engineering/en us/topics/ infrastructure/2020/rebuild twitter public api 2020

work page 2020
[21]

Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,

Z. Jia and E. Witchel, “Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 152–166. [Onlin...

work page doi:10.1145/3445814.3446701 2021
[22]

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems,

Y . Gan, Y . Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y . He, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, R. Lin, Z. Liu, J. Padilla, and C. Delimitrou, “An open-source benchmark suite for microservices and their hardware-software implications for cloud ...

work page doi:10.1145/3297858.3304013 2019
[23]

Production-Grade Container Orchestration,

“Production-Grade Container Orchestration,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://kubernetes.io

work page 2024
[24]

”Swarm mode overview

“”Swarm mode overview”,” Dec. 2023, [Online; accessed

work page 2023
[25]

Jan. 2024]. [Online]. Available: https://docs.docker. com/engine/swarm

work page 2024
[26]

Nomad|HashiCorp Developer,

“Nomad|HashiCorp Developer,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https: //developer.hashicorp.com/nomad

work page 2024
[27]

Horizontal Pod Autoscaler - Kubernetes,

“Horizontal Pod Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://kubernetes.io/docs/tasks/ run-application/horizontal-pod-autoscale/

work page 2024
[28]

Cluster Autoscaler - Kubernetes,

“Cluster Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://github.com/kubernetes/autoscaler/tree/ master/cluster-autoscaler

work page 2024
[29]

AWS EKS Horizontal Pod Autoscaler Sync Interval,

“AWS EKS Horizontal Pod Autoscaler Sync Interval,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://github.com/aws/containers-roadmap/ issues/1809

work page 2024
[30]

Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,

“Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,” Jan. 2024, [Online; accessed 10. Jan. 2024]. [Online]. Avail- able: https://cloud.google.com/kubernetes-engine/docs/ concepts/horizontalpodautoscaler

work page 2024
[31]

An empirical analysis of vm startup times in public iaas clouds: An extended report,

J. Hao, T. Jiang, W. Wang, and I. K. Kim, “An empirical analysis of vm startup times in public iaas clouds: An extended report,” 2021

work page 2021
[32]

”Traffic Shedding against Stampeding Herd Effect from the Mobile App

“”Traffic Shedding against Stampeding Herd Effect from the Mobile App”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ news/2023/10/monzo-app-traffic-shedding/

work page 2022
[33]

May 2024]

“Istio,” May 2024, [Online; accessed 22. May 2024]. [Online]. Available: https://istio.io

work page 2024
[34]

”BookInfo Application

“”BookInfo Application”,” May 2024, [Online; accessed

work page 2024
[35]

May. 2024]. [Online]. Available: https://istio.io/ latest/docs/examples/bookinfo/

work page 2024
[36]

Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,

A. F. Baarzi, T. Zhu, and B. Urgaonkar, “Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 126–138. [Online]. Available: https://doi.org/10.1145/3357223.3362706

work page doi:10.1145/3357223.3362706 2019
[37]

Long-term slos for reclaimed cloud computing resources,

M. Carvalho, W. Cirne, F. Brasileiro, and J. Wilkes, “Long-term slos for reclaimed cloud computing resources,” inProceedings of the ACM Symposium on Cloud Computing, ser. SOCC ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 1–13. [Online]. Available: https://doi.org/10.1145/2670979.2670999

work page doi:10.1145/2670979.2670999 2014
[38]

Query-based workload forecasting for self-driving database management systems,

L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based workload forecasting for self-driving database management systems,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 631–645. [Online]. Available: https://doi.org/...

work page arXiv 2018
[39]

OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,

A. Mahgoub, A. M. Medoff, R. Kumar, S. Mitra, A. Klimovic, S. Chaterji, and S. Bagchi, “OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,” in2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Jul. 2020, pp. 189–203. [Online]. Available: https://www.usenix.org/conference/ atc20/presen...

work page 2020
[40]

Autopilot: workload autoscaling at google,

K. Rzadca, P. Findeisen, J. Swiderski, P. Zych, P. Broniek, J. Kusmierek, P. Nowak, B. Strack, P. Witusowski, S. Hand, and J. Wilkes, “Autopilot: workload autoscaling at google,” inProceedings of the Fifteenth European Conference on Computer Systems, ser. EuroSys ’20. New York, NY , USA: Association for Computing Machinery, 2020. [Online]. Available: http...

work page doi:10.1145/3342195.3387524 2020
[41]

Cloudscale: elastic resource scaling for multi-tenant cloud systems,

Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloudscale: elastic resource scaling for multi-tenant cloud systems,” inProceedings of the 2nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY , USA: Association for Computing Machinery, 2011. [Online]. Available: https://doi.org/10.1145/2038916.2038921

work page doi:10.1145/2038916.2038921 2011
[42]

Sequence to sequence learning with neural networks,

I. Sutskever, O. Vinyals, and Q. V . Le, “Sequence to sequence learning with neural networks,”Advances in neural information processing systems, vol. 27, 2014

work page 2014
[43]

Krishnamurthi, A

R. Krishnamurthi, A. Kumar, S. S. Gill, and R. Buyya, Serverless Computing: Principles and Paradigms. Springer, 05 2023

work page 2023
[44]

Lambda function scaling - AWS Lambda,

“Lambda function scaling - AWS Lambda,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/ dg/lambda-concurrency.html

work page 2024
[45]

Sebs: A serverless benchmark suite for function-as-a-service computing,

M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “Sebs: A serverless benchmark suite for function-as-a-service computing,” 2021

work page 2021
[46]

Cloud programming simplified: A berkeley view on serverless computing,

E. Jonas, J. Schleier-Smith, V . Sreekanti, C.-C. Tsai, A. Khandelwal, Q. Pu, V . Shankar, J. Carreira, K. Krauth, N. Yadwadkar, J. E. Gonzalez, R. A. Popa, I. Stoica, and D. A. Patterson, “Cloud programming simplified: A berkeley view on serverless computing,” 2019

work page 2019
[47]

Optimizing Lambda Cost with Multi-Threading,

“Optimizing Lambda Cost with Multi-Threading,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://web.archive.org/web/ 20220629183438/https://www.sentiatechblog.com/ aws-re-invent-2020-day-3-optimizing-lambda-cost-with-multi-threading

work page 2024
[48]

Application Load Balancer|Elastic Load Balancing| Amazon Web Services,

“Application Load Balancer|Elastic Load Balancing| Amazon Web Services,” May 2024, [Online; accessed

work page 2024
[49]

[Online]

May 2024]. [Online]. Available: https://aws.amazon. com/elasticloadbalancing/application-load-balancer

work page 2024
[50]

Envoy proxy - home,

“Envoy proxy - home,” Feb. 2024, [Online; accessed 2. Feb. 2024]. [Online]. Available: https://www.envoyproxy. io

work page 2024
[51]

Envoy Proxy — Load Balancing,

“Envoy Proxy — Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Avail- able: https://www.envoyproxy.io/docs/envoy/latest/intro/ arch overview/upstream/load balancing/load balancers

work page 2024
[52]

NGINX Docs — NGINX Load Balancing,

“NGINX Docs — NGINX Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://nginx.org/en/docs/http/load balancing. html#nginx weighted load balancing

work page 2024
[53]

Home - Knative,

“Home - Knative,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://knative.dev/docs

work page 2024
[54]

The Istio service mesh,

“The Istio service mesh,” Jan. 2024, [Online; accessed

work page 2024
[55]

Jan. 2024]. [Online]. Available: https://istio.io/latest/ about/service-mesh

work page 2024
[56]

Case studies from istio,

I. Team, “Case studies from istio,” 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https: //istio.io/latest/about/case-studies/

work page 2024
[57]

The Linkerd service mesh,

“The Linkerd service mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https: //linkerd.io/

work page 2024
[58]

Dynamically balancing load with overload control for microservices,

R. Bhattacharya, Y . Gao, and T. Wood, “Dynamically balancing load with overload control for microservices,” ACM Trans. Auton. Adapt. Syst., vol. 19, no. 4, Nov. 2024. [Online]. Available: https://doi.org/10.1145/ 3676167

work page 2024
[59]

AWS App Mesh,

“AWS App Mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https://aws.amazon. com/app-mesh/

work page 2024
[60]

cadvisor,

“cadvisor,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://github.com/google/cadvisor

work page 2024
[61]

Prometheus - Monitoring system & time series database,

Prometheus, “Prometheus - Monitoring system & time series database,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://prometheus.io

work page 2024
[62]

Creating event-driven architectures with Lambda,

“Creating event-driven architectures with Lambda,” https://docs.aws.amazon.com/lambda/latest/dg/ concepts-event-driven-architectures.html, Jan. 2025, accessed 10. Dec. 2025

work page 2025
[63]

AWS Lambda Web Adapter,

“AWS Lambda Web Adapter,” https://github.com/ awslabs/aws-lambda-web-adapter, Jan. 2022, accessed

work page 2022
[64]

Serverless Adapter,

“Serverless Adapter,” https://github.com/H4ad/ serverless-adapter, Jan. 2022, accessed 10. Sep. 2025

work page 2022
[65]

GRPC Gateway,

“GRPC Gateway,” https://github.com/grpc-ecosystem/ grpc-gateway, Jan. 2025, accessed 10. Dec. 2025

work page 2025
[66]

Traceupscaler: Upscaling traces to evaluate systems at high load,

S. M. Sajal, T. Zhu, B. Urgaonkar, and S. Sen, “Traceupscaler: Upscaling traces to evaluate systems at high load,” inProceedings of the Nineteenth European Conference on Computer Systems, ser. EuroSys ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 942–961. [Online]. Available: https://doi.org/10.1145/3627703.3629581

work page doi:10.1145/3627703.3629581 2024
[67]

Locust.io,

“Locust.io,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://locust.io

work page 2024
[68]

”Online Boutique Microservice Application

“”Online Boutique Microservice Application”,” May 2024, [Online; accessed 20. May. 2024]. [Online]. Available: https://github.com/ GoogleCloudPlatform/microservices-demo

work page 2024
[69]

Benchmarking, analysis, and optimization of serverless function snapshots,

D. Ustiugov, P. Petrov, M. Kogias, E. Bugnion, and B. Grot, “Benchmarking, analysis, and optimization of serverless function snapshots,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 559–572....

work page doi:10.1145/3445814.3446714 2021
[70]

Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,

“Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https: //aws.amazon.com/eks

work page 2024
[71]

AWS VM Instances cost,

“AWS VM Instances cost,” Jan. 2024, [Online; accessed

work page 2024
[72]

Jan. 2024]. [Online]. Available: https://aws.amazon. com/ec2/instance-types/t3/

work page 2024
[73]

Xanadu: Mitigating cascading cold starts in serverless function chain deployments,

N. Daw, U. Bellur, and P. Kulkarni, “Xanadu: Mitigating cascading cold starts in serverless function chain deployments,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 356–370. [Online]. Available: https://doi.org/ 10.1145/3423211.3425690

work page doi:10.1145/3423211.3425690 2020
[74]

AWS Lambda pricing calculator,

“AWS Lambda pricing calculator,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https: //calculator.aws/#/createCalculator/Lambda

work page 2024
[75]

Availability in globally distributed storage systems,

D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V .-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally distributed storage systems,” in9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10). Vancouver, BC: USENIX Association, Oct. 2010. [Online]. Available: https://www.usenix.org/conference/ osdi10/availabi...

work page 2010
[76]

Node Status Check,

“Node Status Check,” May 2024, [Online; accessed 21. May 2024]. [Online]. Available: https://kubernetes.io/ docs/reference/node/node-status/

work page 2024
[77]

Autoscale: Dynamic, robust capacity management for multi-tier data centers,

A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: Dynamic, robust capacity management for multi-tier data centers,”ACM Trans. Comput. Syst., vol. 30, no. 4, nov 2012. [Online]. Available: https://doi.org/10.1145/2382553.2382556

work page doi:10.1145/2382553.2382556 2012
[78]

Met: workload aware elasticity for nosql,

F. Cruz, F. Maia, M. Matos, R. Oliveira, J. a. Paulo, J. Pereira, and R. Vilac ¸a, “Met: workload aware elasticity for nosql,” inProceedings of the 8th ACM European Conference on Computer Systems, ser. EuroSys ’13. New York, NY , USA: Association for Computing Machinery, 2013, p. 183–196. [Online]. Available: https://doi.org/10.1145/2465351.2465370

work page doi:10.1145/2465351.2465370 2013
[79]

Why is it not solved yet? challenges for production-ready autoscaling,

M. Straesser, J. Grohmann, J. von Kistowski, S. Eismann, A. Bauer, and S. Kounev, “Why is it not solved yet? challenges for production-ready autoscaling,” in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, ser. ICPE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 105–115. [Online]. Available:...

work page doi:10.1145/3489525.3511680 2022
[80]

Autoscaling - Amazon EKS,

“Autoscaling - Amazon EKS,” May 2024, [Online; ac- cessed 20. May 2024]. [Online]. Available: https://docs. aws.amazon.com/eks/latest/userguide/autoscaling.html

work page 2024

Showing first 80 references.

[1] [1]

Characterizing microservice dependency and performance: Alibaba trace analysis,

S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 412–426. [Online]. Available: https://doi.org/10.1145/3472883.3487003

work page doi:10.1145/3472883.3487003 2021

[2] [2]

The power of prediction: Microservice auto scaling via workload learning,

S. Luo, H. Xu, K. Ye, G. Xu, L. Zhang, G. Yang, and C. Xu, “The power of prediction: Microservice auto scaling via workload learning,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 355–369. [Online]. Available: https://doi.org/10.1145/3542929.3563477

work page doi:10.1145/3542929.3563477 2022

[3] [3]

Archive Team: The Twitter Stream Grab,

“Archive Team: The Twitter Stream Grab,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://archive.org/details/twitterstream

work page 2024

[4] [4]

FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,

H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp. 805–825. [Online]. Available: https://www.usenix.org/conference/ osdi20/pres...

work page 2020

[5] [5]

”Fascinating facts about facades at CBS Sports

“”Fascinating facts about facades at CBS Sports”,” Dec. 2022, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://www.gomomento.com/blog/ fascinating-facts-about-facades-at-cbs-sports

work page 2022

[6] [6]

”How Netflix Ensures Highly-Reliable Online Stateful Systems

“”How Netflix Ensures Highly-Reliable Online Stateful Systems”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ articles/netflix-highly-reliable-stateful-systems/

work page 2022

[7] [7]

Unity — Asset Store,

“Unity — Asset Store,” Jan. 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https://assetstore.unity. com/

work page 2024

[8] [8]

Store Server Overloaded Resulting in Missed Flash Sales Purchases,

“Store Server Overloaded Resulting in Missed Flash Sales Purchases,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://forum.unity.com/threads/ store-server-overloaded-resulting-in-missed-flash-sales-purchases. 1265966

work page 2024

[9] [9]

Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,

Z. Wang, S. Zhu, J. Li, W. Jiang, K. K. Ramakrishnan, Y . Zheng, M. Yan, X. Zhang, and A. X. Liu, “Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 16–30. [Online]. Availab...

work page doi:10.1145/3542929.3563469 2022

[10] [10]

Lessons learned from migrating complex stateful applications onto serverless platforms,

Z. Jin, Y . Zhu, J. Zhu, D. Yu, C. Li, R. Chen, I. E. Akkus, and Y . Xu, “Lessons learned from migrating complex stateful applications onto serverless platforms,” inProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems, ser. APSys ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 89–96. [Online]. Available: https: //doi....

work page doi:10.1145/3476886.3477510 2021

[11] [11]

Splitserve: Efficiently splitting apache spark jobs across faas and iaas,

A. Jain, A. F. Baarzi, G. Kesidis, B. Urgaonkar, N. Alfares, and M. Kandemir, “Splitserve: Efficiently splitting apache spark jobs across faas and iaas,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 236–250. [Online]. Available: https://doi.org/10.1145...

work page arXiv 2020

[12] [12]

Cackle: Analytical workload cost and performance stability with elastic pools,

M. Perron, R. Castro Fernandez, D. DeWitt, M. Cafarella, and S. Madden, “Cackle: Analytical workload cost and performance stability with elastic pools,”Proc. ACM Manag. Data, vol. 1, no. 4, dec 2023. [Online]. Available: https://doi.org/10.1145/3626720

work page doi:10.1145/3626720 2023

[13] [13]

Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,

C. Zhang, M. Yu, W. Wang, and F. Yan, “Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,” inProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’19. USA: USENIX Association, 2019, p. 1049–1062

work page 2019

[14] [14]

Sora: A latency sensitive approach for microservice soft resource adaptation,

J. Liu, Q. Wang, S. Zhang, L. Hu, and D. Da Silva, “Sora: A latency sensitive approach for microservice soft resource adaptation,” inProceedings of the 24th International Middleware Conference, ser. Middleware ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 43–56. [Online]. Available: https: //doi.org/10.1145/3590140.3592851

work page doi:10.1145/3590140.3592851 2023

[15] [15]

Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,

R. Buyya, S. K. Garg, and R. N. Calheiros, “Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,” in2011 International Con- ference on Cloud and Service Computing, 2011, pp. 1– 10

work page 2011

[16] [16]

Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,

T. Currie, “Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,” https://thenewstack. io/airbnbs-10-takeaways-moving-microservices/, [Accessed 08-01-2024]

work page 2024

[17] [17]

Microservices at netflix: Lessons for architectural design,

W. Team, “Microservices at netflix: Lessons for architectural design,” Jan 2023. [Online]. Available: https://www.nginx.com/blog/ microservices-at-netflix-architectural-best-practices/

work page 2023

[18] [18]

Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts

“Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts.” [Online]. Available: https://www.linkedin.com/blog/engineering/archive/ q-a-with-jim-brikman-splitting-up-a-codebase-into-microservices

work page

[19] [19]

The Opportunities Microservices Provide at Uber Engineering,

“The Opportunities Microservices Provide at Uber Engineering,” Apr. 2016, [Online; accessed 8. Jan. 2024]. [Online]. Available: https://www.uber.com/en-GB/blog/ building-tincup-microservice-implementation

work page 2016

[20] [20]

Rebuilding twitter’s public api,

J. Q. Hylbert and S. Cosenza, “Rebuilding twitter’s public api,” 12 August 2020, [On- line; accessed 8. Jan. 2024]. [Online]. Avail- able: https://blog.twitter.com/engineering/en us/topics/ infrastructure/2020/rebuild twitter public api 2020

work page 2020

[21] [21]

Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,

Z. Jia and E. Witchel, “Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 152–166. [Onlin...

work page doi:10.1145/3445814.3446701 2021

[22] [22]

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems,

Y . Gan, Y . Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y . He, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, R. Lin, Z. Liu, J. Padilla, and C. Delimitrou, “An open-source benchmark suite for microservices and their hardware-software implications for cloud ...

work page doi:10.1145/3297858.3304013 2019

[23] [23]

Production-Grade Container Orchestration,

“Production-Grade Container Orchestration,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://kubernetes.io

work page 2024

[24] [24]

”Swarm mode overview

“”Swarm mode overview”,” Dec. 2023, [Online; accessed

work page 2023

[25] [25]

Jan. 2024]. [Online]. Available: https://docs.docker. com/engine/swarm

work page 2024

[26] [26]

Nomad|HashiCorp Developer,

“Nomad|HashiCorp Developer,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https: //developer.hashicorp.com/nomad

work page 2024

[27] [27]

Horizontal Pod Autoscaler - Kubernetes,

“Horizontal Pod Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://kubernetes.io/docs/tasks/ run-application/horizontal-pod-autoscale/

work page 2024

[28] [28]

Cluster Autoscaler - Kubernetes,

“Cluster Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://github.com/kubernetes/autoscaler/tree/ master/cluster-autoscaler

work page 2024

[29] [29]

AWS EKS Horizontal Pod Autoscaler Sync Interval,

“AWS EKS Horizontal Pod Autoscaler Sync Interval,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://github.com/aws/containers-roadmap/ issues/1809

work page 2024

[30] [30]

Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,

“Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,” Jan. 2024, [Online; accessed 10. Jan. 2024]. [Online]. Avail- able: https://cloud.google.com/kubernetes-engine/docs/ concepts/horizontalpodautoscaler

work page 2024

[31] [31]

An empirical analysis of vm startup times in public iaas clouds: An extended report,

J. Hao, T. Jiang, W. Wang, and I. K. Kim, “An empirical analysis of vm startup times in public iaas clouds: An extended report,” 2021

work page 2021

[32] [32]

”Traffic Shedding against Stampeding Herd Effect from the Mobile App

“”Traffic Shedding against Stampeding Herd Effect from the Mobile App”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ news/2023/10/monzo-app-traffic-shedding/

work page 2022

[33] [33]

May 2024]

“Istio,” May 2024, [Online; accessed 22. May 2024]. [Online]. Available: https://istio.io

work page 2024

[34] [34]

”BookInfo Application

“”BookInfo Application”,” May 2024, [Online; accessed

work page 2024

[35] [35]

May. 2024]. [Online]. Available: https://istio.io/ latest/docs/examples/bookinfo/

work page 2024

[36] [36]

Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,

A. F. Baarzi, T. Zhu, and B. Urgaonkar, “Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 126–138. [Online]. Available: https://doi.org/10.1145/3357223.3362706

work page doi:10.1145/3357223.3362706 2019

[37] [37]

Long-term slos for reclaimed cloud computing resources,

M. Carvalho, W. Cirne, F. Brasileiro, and J. Wilkes, “Long-term slos for reclaimed cloud computing resources,” inProceedings of the ACM Symposium on Cloud Computing, ser. SOCC ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 1–13. [Online]. Available: https://doi.org/10.1145/2670979.2670999

work page doi:10.1145/2670979.2670999 2014

[38] [38]

Query-based workload forecasting for self-driving database management systems,

L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based workload forecasting for self-driving database management systems,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 631–645. [Online]. Available: https://doi.org/...

work page arXiv 2018

[39] [39]

OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,

A. Mahgoub, A. M. Medoff, R. Kumar, S. Mitra, A. Klimovic, S. Chaterji, and S. Bagchi, “OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,” in2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Jul. 2020, pp. 189–203. [Online]. Available: https://www.usenix.org/conference/ atc20/presen...

work page 2020

[40] [40]

Autopilot: workload autoscaling at google,

K. Rzadca, P. Findeisen, J. Swiderski, P. Zych, P. Broniek, J. Kusmierek, P. Nowak, B. Strack, P. Witusowski, S. Hand, and J. Wilkes, “Autopilot: workload autoscaling at google,” inProceedings of the Fifteenth European Conference on Computer Systems, ser. EuroSys ’20. New York, NY , USA: Association for Computing Machinery, 2020. [Online]. Available: http...

work page doi:10.1145/3342195.3387524 2020

[41] [41]

Cloudscale: elastic resource scaling for multi-tenant cloud systems,

Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloudscale: elastic resource scaling for multi-tenant cloud systems,” inProceedings of the 2nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY , USA: Association for Computing Machinery, 2011. [Online]. Available: https://doi.org/10.1145/2038916.2038921

work page doi:10.1145/2038916.2038921 2011

[42] [42]

Sequence to sequence learning with neural networks,

I. Sutskever, O. Vinyals, and Q. V . Le, “Sequence to sequence learning with neural networks,”Advances in neural information processing systems, vol. 27, 2014

work page 2014

[43] [43]

Krishnamurthi, A

R. Krishnamurthi, A. Kumar, S. S. Gill, and R. Buyya, Serverless Computing: Principles and Paradigms. Springer, 05 2023

work page 2023

[44] [44]

Lambda function scaling - AWS Lambda,

“Lambda function scaling - AWS Lambda,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/ dg/lambda-concurrency.html

work page 2024

[45] [45]

Sebs: A serverless benchmark suite for function-as-a-service computing,

M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “Sebs: A serverless benchmark suite for function-as-a-service computing,” 2021

work page 2021

[46] [46]

Cloud programming simplified: A berkeley view on serverless computing,

E. Jonas, J. Schleier-Smith, V . Sreekanti, C.-C. Tsai, A. Khandelwal, Q. Pu, V . Shankar, J. Carreira, K. Krauth, N. Yadwadkar, J. E. Gonzalez, R. A. Popa, I. Stoica, and D. A. Patterson, “Cloud programming simplified: A berkeley view on serverless computing,” 2019

work page 2019

[47] [47]

Optimizing Lambda Cost with Multi-Threading,

“Optimizing Lambda Cost with Multi-Threading,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://web.archive.org/web/ 20220629183438/https://www.sentiatechblog.com/ aws-re-invent-2020-day-3-optimizing-lambda-cost-with-multi-threading

work page 2024

[48] [48]

Application Load Balancer|Elastic Load Balancing| Amazon Web Services,

“Application Load Balancer|Elastic Load Balancing| Amazon Web Services,” May 2024, [Online; accessed

work page 2024

[49] [49]

[Online]

May 2024]. [Online]. Available: https://aws.amazon. com/elasticloadbalancing/application-load-balancer

work page 2024

[50] [50]

Envoy proxy - home,

“Envoy proxy - home,” Feb. 2024, [Online; accessed 2. Feb. 2024]. [Online]. Available: https://www.envoyproxy. io

work page 2024

[51] [51]

Envoy Proxy — Load Balancing,

“Envoy Proxy — Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Avail- able: https://www.envoyproxy.io/docs/envoy/latest/intro/ arch overview/upstream/load balancing/load balancers

work page 2024

[52] [52]

NGINX Docs — NGINX Load Balancing,

“NGINX Docs — NGINX Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://nginx.org/en/docs/http/load balancing. html#nginx weighted load balancing

work page 2024

[53] [53]

Home - Knative,

“Home - Knative,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://knative.dev/docs

work page 2024

[54] [54]

The Istio service mesh,

“The Istio service mesh,” Jan. 2024, [Online; accessed

work page 2024

[55] [55]

Jan. 2024]. [Online]. Available: https://istio.io/latest/ about/service-mesh

work page 2024

[56] [56]

Case studies from istio,

I. Team, “Case studies from istio,” 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https: //istio.io/latest/about/case-studies/

work page 2024

[57] [57]

The Linkerd service mesh,

“The Linkerd service mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https: //linkerd.io/

work page 2024

[58] [58]

Dynamically balancing load with overload control for microservices,

R. Bhattacharya, Y . Gao, and T. Wood, “Dynamically balancing load with overload control for microservices,” ACM Trans. Auton. Adapt. Syst., vol. 19, no. 4, Nov. 2024. [Online]. Available: https://doi.org/10.1145/ 3676167

work page 2024

[59] [59]

AWS App Mesh,

“AWS App Mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https://aws.amazon. com/app-mesh/

work page 2024

[60] [60]

cadvisor,

“cadvisor,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://github.com/google/cadvisor

work page 2024

[61] [61]

Prometheus - Monitoring system & time series database,

Prometheus, “Prometheus - Monitoring system & time series database,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://prometheus.io

work page 2024

[62] [62]

Creating event-driven architectures with Lambda,

“Creating event-driven architectures with Lambda,” https://docs.aws.amazon.com/lambda/latest/dg/ concepts-event-driven-architectures.html, Jan. 2025, accessed 10. Dec. 2025

work page 2025

[63] [63]

AWS Lambda Web Adapter,

“AWS Lambda Web Adapter,” https://github.com/ awslabs/aws-lambda-web-adapter, Jan. 2022, accessed

work page 2022

[64] [64]

Serverless Adapter,

“Serverless Adapter,” https://github.com/H4ad/ serverless-adapter, Jan. 2022, accessed 10. Sep. 2025

work page 2022

[65] [65]

GRPC Gateway,

“GRPC Gateway,” https://github.com/grpc-ecosystem/ grpc-gateway, Jan. 2025, accessed 10. Dec. 2025

work page 2025

[66] [66]

Traceupscaler: Upscaling traces to evaluate systems at high load,

S. M. Sajal, T. Zhu, B. Urgaonkar, and S. Sen, “Traceupscaler: Upscaling traces to evaluate systems at high load,” inProceedings of the Nineteenth European Conference on Computer Systems, ser. EuroSys ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 942–961. [Online]. Available: https://doi.org/10.1145/3627703.3629581

work page doi:10.1145/3627703.3629581 2024

[67] [67]

Locust.io,

“Locust.io,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://locust.io

work page 2024

[68] [68]

”Online Boutique Microservice Application

“”Online Boutique Microservice Application”,” May 2024, [Online; accessed 20. May. 2024]. [Online]. Available: https://github.com/ GoogleCloudPlatform/microservices-demo

work page 2024

[69] [69]

Benchmarking, analysis, and optimization of serverless function snapshots,

D. Ustiugov, P. Petrov, M. Kogias, E. Bugnion, and B. Grot, “Benchmarking, analysis, and optimization of serverless function snapshots,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 559–572....

work page doi:10.1145/3445814.3446714 2021

[70] [70]

Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,

“Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https: //aws.amazon.com/eks

work page 2024

[71] [71]

AWS VM Instances cost,

“AWS VM Instances cost,” Jan. 2024, [Online; accessed

work page 2024

[72] [72]

Jan. 2024]. [Online]. Available: https://aws.amazon. com/ec2/instance-types/t3/

work page 2024

[73] [73]

Xanadu: Mitigating cascading cold starts in serverless function chain deployments,

N. Daw, U. Bellur, and P. Kulkarni, “Xanadu: Mitigating cascading cold starts in serverless function chain deployments,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 356–370. [Online]. Available: https://doi.org/ 10.1145/3423211.3425690

work page doi:10.1145/3423211.3425690 2020

[74] [74]

AWS Lambda pricing calculator,

“AWS Lambda pricing calculator,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https: //calculator.aws/#/createCalculator/Lambda

work page 2024

[75] [75]

Availability in globally distributed storage systems,

D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V .-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally distributed storage systems,” in9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10). Vancouver, BC: USENIX Association, Oct. 2010. [Online]. Available: https://www.usenix.org/conference/ osdi10/availabi...

work page 2010

[76] [76]

Node Status Check,

“Node Status Check,” May 2024, [Online; accessed 21. May 2024]. [Online]. Available: https://kubernetes.io/ docs/reference/node/node-status/

work page 2024

[77] [77]

Autoscale: Dynamic, robust capacity management for multi-tier data centers,

A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: Dynamic, robust capacity management for multi-tier data centers,”ACM Trans. Comput. Syst., vol. 30, no. 4, nov 2012. [Online]. Available: https://doi.org/10.1145/2382553.2382556

work page doi:10.1145/2382553.2382556 2012

[78] [78]

Met: workload aware elasticity for nosql,

F. Cruz, F. Maia, M. Matos, R. Oliveira, J. a. Paulo, J. Pereira, and R. Vilac ¸a, “Met: workload aware elasticity for nosql,” inProceedings of the 8th ACM European Conference on Computer Systems, ser. EuroSys ’13. New York, NY , USA: Association for Computing Machinery, 2013, p. 183–196. [Online]. Available: https://doi.org/10.1145/2465351.2465370

work page doi:10.1145/2465351.2465370 2013

[79] [79]

Why is it not solved yet? challenges for production-ready autoscaling,

M. Straesser, J. Grohmann, J. von Kistowski, S. Eismann, A. Bauer, and S. Kounev, “Why is it not solved yet? challenges for production-ready autoscaling,” in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, ser. ICPE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 105–115. [Online]. Available:...

work page doi:10.1145/3489525.3511680 2022

[80] [80]

Autoscaling - Amazon EKS,

“Autoscaling - Amazon EKS,” May 2024, [Online; ac- cessed 20. May 2024]. [Online]. Available: https://docs. aws.amazon.com/eks/latest/userguide/autoscaling.html

work page 2024