pith. sign in

arxiv: 2605.23707 · v1 · pith:OWKP37DLnew · submitted 2026-05-22 · 💻 cs.DC

Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes

Pith reviewed 2026-05-25 02:52 UTC · model grok-4.3

classification 💻 cs.DC
keywords microservicesserverless computingload spikeshybrid deploymentelastic scalingcost optimizationVM provisioning
0
0 comments X

The pith

Flare combines VMs for steady microservice loads with serverless to handle only excess spike traffic from overloaded services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a hybrid architecture that keeps virtual machines running for normal operation because they are cheaper at steady state. When traffic spikes, the system identifies exactly which microservices are overloaded and moves only the surplus requests for those services onto serverless functions. This selective hand-off avoids paying for extra VMs that would sit idle most of the time. The design requires only small changes to the existing control plane and leaves the application code untouched.

Core claim

Flare is a hybrid microservice architecture that utilizes VMs to cost-effectively handle steady workloads and leverages serverless elasticity to absorb traffic spikes by detecting which specific service or services are overloaded and shifting only the excess load of those services to serverless, thereby minimizing cost overhead while requiring minimal changes to the control plane and no modifications to the application.

What carries the argument

The selective load-shifting mechanism that detects overloaded microservices and redirects only their excess traffic to serverless instances.

If this is right

  • Providers avoid the expense of keeping extra VMs idle between spikes.
  • Only the overloaded services incur serverless charges rather than the entire chain.
  • Existing auto-scaling setups can adopt the approach with limited control-plane changes.
  • Application responsiveness stays high during spikes without code changes.
  • Cost savings scale with the duration and intensity of the spike rather than with peak capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selective hand-off idea could be applied to other bursty workloads beyond microservices.
  • If the detection logic proves reliable, it might reduce the need for conservative over-provisioning policies in general.
  • Real-world traces with varying spike shapes would test whether the cost advantage holds when spikes are short or frequent.
  • Integration points with different serverless runtimes could surface hidden compatibility costs not visible in the current design.

Load-bearing premise

That excess load from specific microservices can be handed off to serverless without breaking request chains or adding noticeable latency.

What would settle it

A controlled experiment that measures total cost and tail latency for the same spike pattern under Flare versus a VM-only deployment that over-provisions enough capacity in advance.

Figures

Figures reproduced from arXiv: 2605.23707 by Antonios Katsarakis, Boris Grot, David Schall, Dilina Dehigama, Marios Kogias, Rakesh Kumar, Shyam Jesalpura.

Figure 1
Figure 1. Figure 1: The load trace of Twitter over a week’s time-span. The highlighted [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of a sudden spike in load on a VM based microservice [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Load prediction for two days with unexpected load spikes [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of a sudden load spike on tail latency (P95) on a entirely [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: focuses on the case of a single scalable microser￾vice and provides a high-level depiction of Flare in action. It assumes Flare is deployed on top of K8s, hence it reuses its existing monitoring infrastructure. Microservices run in pods, while an external load balancer, e.g. Envoy or AWS’s Application Load Balancer [46, 47] steers incoming traffic. The Flare Controller runs as a microservice within the clu… view at source ↗
Figure 6
Figure 6. Figure 6: Latency comparison. Red dashed line marks the 400ms SLO. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cost comparison Hotel Reservation, respectively). Similarly, Trace B’s increases stand at 2.5%, 2.6%, and 6.1% (3.8% on average) for the same applications. This indicates that Flare can effectively absorb load spikes with very modest cost impact (less than 4.1% on average). C. Evaluation on AWS Lambda We evaluate Flare’s effectiveness in handling load spikes described in Section VI using a popular producti… view at source ↗
Figure 9
Figure 9. Figure 9: Impact of a node failure on tail latency [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Online services strive to maintain application responsiveness even when the traffic is unpredictable and fluctuating. Today's online services are commonly deployed as chains of microservices, each microservice packaged as one or more containers inside virtual machines (VMs). While performant and affordable when the load is steady, VM-based deployments are known to be slow to scale when the load spikes, resulting in degraded performance for end-users of the service. To avoid such performance degradations, service providers can over-provision their deployments; however, such a strategy is costly and inefficient, leaving resources under-utilized for extended periods. To address the challenge of unpredictable load spikes, we propose Flare, a hybrid microservice architecture that combines VMs with serverless computing. Flare utilizes VMs to cost-effectively handle steady workloads and leverages serverless elasticity to absorb traffic spikes. When a spike occurs, Flare detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless, thus minimizing the cost overhead. Flare seamlessly integrates into existing auto-scaling and serverless infrastructure, requiring minimal changes to the control plane and no modifications to the application.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes Flare, a hybrid microservice architecture combining VMs for steady workloads with serverless functions to absorb traffic spikes. It claims that Flare detects specific overloaded services and shifts only their excess load to serverless, while integrating seamlessly into existing auto-scaling and serverless infrastructure with minimal control-plane changes and no application modifications.

Significance. If the proposed mechanisms and integration claims hold, Flare could provide a practical, cost-efficient solution for handling unpredictable loads in containerized microservice deployments without the inefficiencies of over-provisioning. The hybrid approach addresses a real operational challenge in cloud systems. However, the absence of any implementation details, mechanisms, or evaluation data leaves the significance speculative.

major comments (2)
  1. [Abstract] Abstract: The central claim that Flare 'detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless' while requiring 'no modifications to the application' is asserted without any description of a request routing layer, detection heuristic, or service interface assumptions that would enable transparent redirection for arbitrary containerized microservices.
  2. [Abstract] Abstract: No experimental results, implementation details, cost measurements, or performance data are provided to support the claims of minimized cost overhead or maintained responsiveness during spikes, leaving the core performance and integration assertions unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. We agree that the current manuscript version presents the Flare architecture at a high level and that the abstract's claims require supporting descriptions and evidence. We will revise the manuscript to address these points by expanding the mechanisms and adding evaluation data.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that Flare 'detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless' while requiring 'no modifications to the application' is asserted without any description of a request routing layer, detection heuristic, or service interface assumptions that would enable transparent redirection for arbitrary containerized microservices.

    Authors: We agree that the abstract makes these assertions without accompanying detail in the current text. The revised manuscript will add a dedicated section describing the request routing layer (including how it intercepts and redirects traffic selectively), the overload detection heuristic (based on per-service metrics from existing auto-scaling infrastructure), and the interface assumptions that permit transparent redirection for standard containerized microservices without application changes. revision: yes

  2. Referee: [Abstract] Abstract: No experimental results, implementation details, cost measurements, or performance data are provided to support the claims of minimized cost overhead or maintained responsiveness during spikes, leaving the core performance and integration assertions unsupported.

    Authors: We agree that the current manuscript contains no implementation details, cost measurements, or performance data. The revised version will include a prototype implementation description, integration with existing auto-scaling and serverless platforms, and evaluation results (including cost and latency measurements under synthetic and real-world spike workloads) to substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture proposal with no derivations or fitted claims

full rationale

The paper is a high-level system architecture proposal describing a hybrid VM+serverless design for handling load spikes. It contains no equations, no quantitative models, no fitted parameters, and no derivation chain that could reduce a prediction or result to its inputs by construction. Central claims (e.g., detection of overloaded services and transparent redirection) are presented as design properties rather than outputs of any self-referential computation or self-citation load-bearing argument. No enumerated circularity patterns apply; the work is self-contained as an engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available so ledger is minimal; proposal rests on domain assumptions about serverless elasticity and integration ease rather than new parameters or entities.

axioms (2)
  • domain assumption Serverless functions can absorb excess microservice load with negligible overhead and high elasticity
    Central to the cost and responsiveness claims in the abstract.
  • domain assumption Existing auto-scaling and serverless platforms allow seamless integration with minimal control-plane changes
    Stated directly as a requirement for practicality.

pith-pipeline@v0.9.0 · 5758 in / 1224 out tokens · 18358 ms · 2026-05-25T02:52:38.101474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages

  1. [1]

    Characterizing microservice dependency and performance: Alibaba trace analysis,

    S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 412–426. [Online]. Available: https://doi.org/10.1145/3472883.3487003

  2. [2]

    The power of prediction: Microservice auto scaling via workload learning,

    S. Luo, H. Xu, K. Ye, G. Xu, L. Zhang, G. Yang, and C. Xu, “The power of prediction: Microservice auto scaling via workload learning,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 355–369. [Online]. Available: https://doi.org/10.1145/3542929.3563477

  3. [3]

    Archive Team: The Twitter Stream Grab,

    “Archive Team: The Twitter Stream Grab,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://archive.org/details/twitterstream

  4. [4]

    FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,

    H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp. 805–825. [Online]. Available: https://www.usenix.org/conference/ osdi20/pres...

  5. [5]

    ”Fascinating facts about facades at CBS Sports

    “”Fascinating facts about facades at CBS Sports”,” Dec. 2022, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://www.gomomento.com/blog/ fascinating-facts-about-facades-at-cbs-sports

  6. [6]

    ”How Netflix Ensures Highly-Reliable Online Stateful Systems

    “”How Netflix Ensures Highly-Reliable Online Stateful Systems”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ articles/netflix-highly-reliable-stateful-systems/

  7. [7]

    Unity — Asset Store,

    “Unity — Asset Store,” Jan. 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https://assetstore.unity. com/

  8. [8]

    Store Server Overloaded Resulting in Missed Flash Sales Purchases,

    “Store Server Overloaded Resulting in Missed Flash Sales Purchases,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://forum.unity.com/threads/ store-server-overloaded-resulting-in-missed-flash-sales-purchases. 1265966

  9. [9]

    Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,

    Z. Wang, S. Zhu, J. Li, W. Jiang, K. K. Ramakrishnan, Y . Zheng, M. Yan, X. Zhang, and A. X. Liu, “Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,” inProceedings of the 13th Symposium on Cloud Computing, ser. SoCC ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 16–30. [Online]. Availab...

  10. [10]

    Lessons learned from migrating complex stateful applications onto serverless platforms,

    Z. Jin, Y . Zhu, J. Zhu, D. Yu, C. Li, R. Chen, I. E. Akkus, and Y . Xu, “Lessons learned from migrating complex stateful applications onto serverless platforms,” inProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems, ser. APSys ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 89–96. [Online]. Available: https: //doi....

  11. [11]

    Splitserve: Efficiently splitting apache spark jobs across faas and iaas,

    A. Jain, A. F. Baarzi, G. Kesidis, B. Urgaonkar, N. Alfares, and M. Kandemir, “Splitserve: Efficiently splitting apache spark jobs across faas and iaas,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 236–250. [Online]. Available: https://doi.org/10.1145...

  12. [12]

    Cackle: Analytical workload cost and performance stability with elastic pools,

    M. Perron, R. Castro Fernandez, D. DeWitt, M. Cafarella, and S. Madden, “Cackle: Analytical workload cost and performance stability with elastic pools,”Proc. ACM Manag. Data, vol. 1, no. 4, dec 2023. [Online]. Available: https://doi.org/10.1145/3626720

  13. [13]

    Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,

    C. Zhang, M. Yu, W. Wang, and F. Yan, “Mark: ex- ploiting cloud services for cost-effective, slo-aware ma- chine learning inference serving,” inProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’19. USA: USENIX Association, 2019, p. 1049–1062

  14. [14]

    Sora: A latency sensitive approach for microservice soft resource adaptation,

    J. Liu, Q. Wang, S. Zhang, L. Hu, and D. Da Silva, “Sora: A latency sensitive approach for microservice soft resource adaptation,” inProceedings of the 24th International Middleware Conference, ser. Middleware ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 43–56. [Online]. Available: https: //doi.org/10.1145/3590140.3592851

  15. [15]

    Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,

    R. Buyya, S. K. Garg, and R. N. Calheiros, “Sla-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions,” in2011 International Con- ference on Cloud and Service Computing, 2011, pp. 1– 10

  16. [16]

    Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,

    T. Currie, “Airbnb’s 10 Takeaways from Moving to Microservices — thenewstack.io,” https://thenewstack. io/airbnbs-10-takeaways-moving-microservices/, [Accessed 08-01-2024]

  17. [17]

    Microservices at netflix: Lessons for architectural design,

    W. Team, “Microservices at netflix: Lessons for architectural design,” Jan 2023. [Online]. Available: https://www.nginx.com/blog/ microservices-at-netflix-architectural-best-practices/

  18. [18]

    Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts

    “Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts.” [Online]. Available: https://www.linkedin.com/blog/engineering/archive/ q-a-with-jim-brikman-splitting-up-a-codebase-into-microservices

  19. [19]

    The Opportunities Microservices Provide at Uber Engineering,

    “The Opportunities Microservices Provide at Uber Engineering,” Apr. 2016, [Online; accessed 8. Jan. 2024]. [Online]. Available: https://www.uber.com/en-GB/blog/ building-tincup-microservice-implementation

  20. [20]

    Rebuilding twitter’s public api,

    J. Q. Hylbert and S. Cosenza, “Rebuilding twitter’s public api,” 12 August 2020, [On- line; accessed 8. Jan. 2024]. [Online]. Avail- able: https://blog.twitter.com/engineering/en us/topics/ infrastructure/2020/rebuild twitter public api 2020

  21. [21]

    Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,

    Z. Jia and E. Witchel, “Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 152–166. [Onlin...

  22. [22]

    An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems,

    Y . Gan, Y . Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y . He, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, R. Lin, Z. Liu, J. Padilla, and C. Delimitrou, “An open-source benchmark suite for microservices and their hardware-software implications for cloud ...

  23. [23]

    Production-Grade Container Orchestration,

    “Production-Grade Container Orchestration,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://kubernetes.io

  24. [24]

    ”Swarm mode overview

    “”Swarm mode overview”,” Dec. 2023, [Online; accessed

  25. [25]

    Jan. 2024]. [Online]. Available: https://docs.docker. com/engine/swarm

  26. [26]

    Nomad|HashiCorp Developer,

    “Nomad|HashiCorp Developer,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https: //developer.hashicorp.com/nomad

  27. [27]

    Horizontal Pod Autoscaler - Kubernetes,

    “Horizontal Pod Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://kubernetes.io/docs/tasks/ run-application/horizontal-pod-autoscale/

  28. [28]

    Cluster Autoscaler - Kubernetes,

    “Cluster Autoscaler - Kubernetes,” Jan. 2024, [Online; accessed 14. Jan. 2024]. [Online]. Available: https://github.com/kubernetes/autoscaler/tree/ master/cluster-autoscaler

  29. [29]

    AWS EKS Horizontal Pod Autoscaler Sync Interval,

    “AWS EKS Horizontal Pod Autoscaler Sync Interval,” Jan. 2024, [Online; accessed 9. Jan. 2024]. [Online]. Available: https://github.com/aws/containers-roadmap/ issues/1809

  30. [30]

    Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,

    “Google Kubernetes Engine Horizontal Pod Au- toscaler Sync Interval,” Jan. 2024, [Online; accessed 10. Jan. 2024]. [Online]. Avail- able: https://cloud.google.com/kubernetes-engine/docs/ concepts/horizontalpodautoscaler

  31. [31]

    An empirical analysis of vm startup times in public iaas clouds: An extended report,

    J. Hao, T. Jiang, W. Wang, and I. K. Kim, “An empirical analysis of vm startup times in public iaas clouds: An extended report,” 2021

  32. [32]

    ”Traffic Shedding against Stampeding Herd Effect from the Mobile App

    “”Traffic Shedding against Stampeding Herd Effect from the Mobile App”,” Dec. 2022, [Online; accessed 25. Apr. 2024]. [Online]. Available: https://www.infoq.com/ news/2023/10/monzo-app-traffic-shedding/

  33. [33]

    May 2024]

    “Istio,” May 2024, [Online; accessed 22. May 2024]. [Online]. Available: https://istio.io

  34. [34]

    ”BookInfo Application

    “”BookInfo Application”,” May 2024, [Online; accessed

  35. [35]

    May. 2024]. [Online]. Available: https://istio.io/ latest/docs/examples/bookinfo/

  36. [36]

    Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,

    A. F. Baarzi, T. Zhu, and B. Urgaonkar, “Burscale: Using burstable instances for cost-effective autoscaling in the public cloud,” inProceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 126–138. [Online]. Available: https://doi.org/10.1145/3357223.3362706

  37. [37]

    Long-term slos for reclaimed cloud computing resources,

    M. Carvalho, W. Cirne, F. Brasileiro, and J. Wilkes, “Long-term slos for reclaimed cloud computing resources,” inProceedings of the ACM Symposium on Cloud Computing, ser. SOCC ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 1–13. [Online]. Available: https://doi.org/10.1145/2670979.2670999

  38. [38]

    Query-based workload forecasting for self-driving database management systems,

    L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based workload forecasting for self-driving database management systems,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 631–645. [Online]. Available: https://doi.org/...

  39. [39]

    OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,

    A. Mahgoub, A. M. Medoff, R. Kumar, S. Mitra, A. Klimovic, S. Chaterji, and S. Bagchi, “OPTIMUS- CLOUD: Heterogeneous configuration optimization for distributed databases in the cloud,” in2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Jul. 2020, pp. 189–203. [Online]. Available: https://www.usenix.org/conference/ atc20/presen...

  40. [40]

    Autopilot: workload autoscaling at google,

    K. Rzadca, P. Findeisen, J. Swiderski, P. Zych, P. Broniek, J. Kusmierek, P. Nowak, B. Strack, P. Witusowski, S. Hand, and J. Wilkes, “Autopilot: workload autoscaling at google,” inProceedings of the Fifteenth European Conference on Computer Systems, ser. EuroSys ’20. New York, NY , USA: Association for Computing Machinery, 2020. [Online]. Available: http...

  41. [41]

    Cloudscale: elastic resource scaling for multi-tenant cloud systems,

    Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloudscale: elastic resource scaling for multi-tenant cloud systems,” inProceedings of the 2nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY , USA: Association for Computing Machinery, 2011. [Online]. Available: https://doi.org/10.1145/2038916.2038921

  42. [42]

    Sequence to sequence learning with neural networks,

    I. Sutskever, O. Vinyals, and Q. V . Le, “Sequence to sequence learning with neural networks,”Advances in neural information processing systems, vol. 27, 2014

  43. [43]

    Krishnamurthi, A

    R. Krishnamurthi, A. Kumar, S. S. Gill, and R. Buyya, Serverless Computing: Principles and Paradigms. Springer, 05 2023

  44. [44]

    Lambda function scaling - AWS Lambda,

    “Lambda function scaling - AWS Lambda,” Jan. 2024, [Online; accessed 11. Jan. 2024]. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/ dg/lambda-concurrency.html

  45. [45]

    Sebs: A serverless benchmark suite for function-as-a-service computing,

    M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “Sebs: A serverless benchmark suite for function-as-a-service computing,” 2021

  46. [46]

    Cloud programming simplified: A berkeley view on serverless computing,

    E. Jonas, J. Schleier-Smith, V . Sreekanti, C.-C. Tsai, A. Khandelwal, Q. Pu, V . Shankar, J. Carreira, K. Krauth, N. Yadwadkar, J. E. Gonzalez, R. A. Popa, I. Stoica, and D. A. Patterson, “Cloud programming simplified: A berkeley view on serverless computing,” 2019

  47. [47]

    Optimizing Lambda Cost with Multi-Threading,

    “Optimizing Lambda Cost with Multi-Threading,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://web.archive.org/web/ 20220629183438/https://www.sentiatechblog.com/ aws-re-invent-2020-day-3-optimizing-lambda-cost-with-multi-threading

  48. [48]

    Application Load Balancer|Elastic Load Balancing| Amazon Web Services,

    “Application Load Balancer|Elastic Load Balancing| Amazon Web Services,” May 2024, [Online; accessed

  49. [49]

    [Online]

    May 2024]. [Online]. Available: https://aws.amazon. com/elasticloadbalancing/application-load-balancer

  50. [50]

    Envoy proxy - home,

    “Envoy proxy - home,” Feb. 2024, [Online; accessed 2. Feb. 2024]. [Online]. Available: https://www.envoyproxy. io

  51. [51]

    Envoy Proxy — Load Balancing,

    “Envoy Proxy — Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Avail- able: https://www.envoyproxy.io/docs/envoy/latest/intro/ arch overview/upstream/load balancing/load balancers

  52. [52]

    NGINX Docs — NGINX Load Balancing,

    “NGINX Docs — NGINX Load Balancing,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://nginx.org/en/docs/http/load balancing. html#nginx weighted load balancing

  53. [53]

    Home - Knative,

    “Home - Knative,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://knative.dev/docs

  54. [54]

    The Istio service mesh,

    “The Istio service mesh,” Jan. 2024, [Online; accessed

  55. [55]

    Jan. 2024]. [Online]. Available: https://istio.io/latest/ about/service-mesh

  56. [56]

    Case studies from istio,

    I. Team, “Case studies from istio,” 2024, [Online; accessed 30. Jan. 2024]. [Online]. Available: https: //istio.io/latest/about/case-studies/

  57. [57]

    The Linkerd service mesh,

    “The Linkerd service mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https: //linkerd.io/

  58. [58]

    Dynamically balancing load with overload control for microservices,

    R. Bhattacharya, Y . Gao, and T. Wood, “Dynamically balancing load with overload control for microservices,” ACM Trans. Auton. Adapt. Syst., vol. 19, no. 4, Nov. 2024. [Online]. Available: https://doi.org/10.1145/ 3676167

  59. [59]

    AWS App Mesh,

    “AWS App Mesh,” May 2024, [Online; accessed 01. May. 2024]. [Online]. Available: https://aws.amazon. com/app-mesh/

  60. [60]

    cadvisor,

    “cadvisor,” Jan. 2024, [Online; accessed 24. Jan. 2024]. [Online]. Available: https://github.com/google/cadvisor

  61. [61]

    Prometheus - Monitoring system & time series database,

    Prometheus, “Prometheus - Monitoring system & time series database,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https://prometheus.io

  62. [62]

    Creating event-driven architectures with Lambda,

    “Creating event-driven architectures with Lambda,” https://docs.aws.amazon.com/lambda/latest/dg/ concepts-event-driven-architectures.html, Jan. 2025, accessed 10. Dec. 2025

  63. [63]

    AWS Lambda Web Adapter,

    “AWS Lambda Web Adapter,” https://github.com/ awslabs/aws-lambda-web-adapter, Jan. 2022, accessed

  64. [64]

    Serverless Adapter,

    “Serverless Adapter,” https://github.com/H4ad/ serverless-adapter, Jan. 2022, accessed 10. Sep. 2025

  65. [65]

    GRPC Gateway,

    “GRPC Gateway,” https://github.com/grpc-ecosystem/ grpc-gateway, Jan. 2025, accessed 10. Dec. 2025

  66. [66]

    Traceupscaler: Upscaling traces to evaluate systems at high load,

    S. M. Sajal, T. Zhu, B. Urgaonkar, and S. Sen, “Traceupscaler: Upscaling traces to evaluate systems at high load,” inProceedings of the Nineteenth European Conference on Computer Systems, ser. EuroSys ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 942–961. [Online]. Available: https://doi.org/10.1145/3627703.3629581

  67. [67]

    Locust.io,

    “Locust.io,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https://locust.io

  68. [68]

    ”Online Boutique Microservice Application

    “”Online Boutique Microservice Application”,” May 2024, [Online; accessed 20. May. 2024]. [Online]. Available: https://github.com/ GoogleCloudPlatform/microservices-demo

  69. [69]

    Benchmarking, analysis, and optimization of serverless function snapshots,

    D. Ustiugov, P. Petrov, M. Kogias, E. Bugnion, and B. Grot, “Benchmarking, analysis, and optimization of serverless function snapshots,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 559–572....

  70. [70]

    Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,

    “Amazon EKS Customers|Managed Kubernetes Service|Amazon Web Services,” Jan. 2024, [Online; accessed 25. Jan. 2024]. [Online]. Available: https: //aws.amazon.com/eks

  71. [71]

    AWS VM Instances cost,

    “AWS VM Instances cost,” Jan. 2024, [Online; accessed

  72. [72]

    Jan. 2024]. [Online]. Available: https://aws.amazon. com/ec2/instance-types/t3/

  73. [73]

    Xanadu: Mitigating cascading cold starts in serverless function chain deployments,

    N. Daw, U. Bellur, and P. Kulkarni, “Xanadu: Mitigating cascading cold starts in serverless function chain deployments,” inProceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 356–370. [Online]. Available: https://doi.org/ 10.1145/3423211.3425690

  74. [74]

    AWS Lambda pricing calculator,

    “AWS Lambda pricing calculator,” Jan. 2024, [Online; accessed 27. Jan. 2024]. [Online]. Available: https: //calculator.aws/#/createCalculator/Lambda

  75. [75]

    Availability in globally distributed storage systems,

    D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V .-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally distributed storage systems,” in9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10). Vancouver, BC: USENIX Association, Oct. 2010. [Online]. Available: https://www.usenix.org/conference/ osdi10/availabi...

  76. [76]

    Node Status Check,

    “Node Status Check,” May 2024, [Online; accessed 21. May 2024]. [Online]. Available: https://kubernetes.io/ docs/reference/node/node-status/

  77. [77]

    Autoscale: Dynamic, robust capacity management for multi-tier data centers,

    A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: Dynamic, robust capacity management for multi-tier data centers,”ACM Trans. Comput. Syst., vol. 30, no. 4, nov 2012. [Online]. Available: https://doi.org/10.1145/2382553.2382556

  78. [78]

    Met: workload aware elasticity for nosql,

    F. Cruz, F. Maia, M. Matos, R. Oliveira, J. a. Paulo, J. Pereira, and R. Vilac ¸a, “Met: workload aware elasticity for nosql,” inProceedings of the 8th ACM European Conference on Computer Systems, ser. EuroSys ’13. New York, NY , USA: Association for Computing Machinery, 2013, p. 183–196. [Online]. Available: https://doi.org/10.1145/2465351.2465370

  79. [79]

    Why is it not solved yet? challenges for production-ready autoscaling,

    M. Straesser, J. Grohmann, J. von Kistowski, S. Eismann, A. Bauer, and S. Kounev, “Why is it not solved yet? challenges for production-ready autoscaling,” in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, ser. ICPE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 105–115. [Online]. Available:...

  80. [80]

    Autoscaling - Amazon EKS,

    “Autoscaling - Amazon EKS,” May 2024, [Online; ac- cessed 20. May 2024]. [Online]. Available: https://docs. aws.amazon.com/eks/latest/userguide/autoscaling.html

Showing first 80 references.