Characterizing FaaS Workflows on Public Clouds: The Good, the Bad and the Ugly
Pith reviewed 2026-05-18 13:20 UTC · model grok-4.3
The pith
Testing 25 FaaS workflows over 132k invocations on AWS and Azure reveals distinct patterns in execution, orchestration, cold starts, and costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through 132k invocations of 25 micro-benchmark and application workflows on AWS and Azure FaaS platforms, the evaluations show that function execution, workflow orchestration, inter-function dataflow, cold-start scaling, and monetary costs follow patterns shaped by each platform's design and the specific workload structure.
What carries the argument
A suite of 25 micro-benchmark and application workflows executed on three public FaaS workflow platforms, used to systematically record execution, orchestration, interaction, scaling, and cost metrics.
If this is right
- Developers can adjust workflow structure and configuration to reduce unexpected cold-start penalties and control costs more tightly.
- Platform operators can target specific orchestration and scaling behaviors for improvement based on measured interaction patterns.
- Performance and cost predictions for new applications become more accurate when they account for the documented inter-function and scaling effects.
- Research efforts can focus on closing the visibility gaps and design limitations identified in current workflow platforms.
- Benchmark suites for future FaaS studies can incorporate the workload categories that exposed the most distinctive behaviors.
Where Pith is reading between the lines
- The same measurement approach could be applied to additional providers to test whether the reported patterns hold across the broader ecosystem.
- Automated tools that select orchestration strategies or provision based on these observed scaling and cost signatures could be developed.
- Similar characterization on emerging workflow features or new dataflow patterns would extend the practical guidance for users.
- The cost and latency data could serve as inputs for economic models of serverless application deployment decisions.
Load-bearing premise
The 25 selected workflows represent typical real-world FaaS usage and the observed behaviors extend beyond the tested platforms and conditions.
What would settle it
A new workflow or an untested platform showing markedly different cold-start latency distributions or cost curves from those reported would undermine the claimed general insights.
Figures
read the original abstract
Function-as-a-service (FaaS) is a popular serverless computing paradigm for developing event-driven functions that elastically scale on public clouds. FaaS workflows, such as AWS Step Functions and Azure Durable Functions, are composed from FaaS functions, like AWS Lambda and Azure Functions, to build practical applications. But, the complex interactions between functions in the workflow and the limited visibility into the internals of proprietary FaaS platforms are major impediments to gaining a deeper understanding of FaaS workflow platforms. While several works characterize FaaS platforms to derive such insights, there is a lack of a principled and rigorous study for FaaS workflow platforms, which have unique scaling, performance and costing behavior influenced by the platform design, dataflow and workloads. In this article, we perform extensive evaluations of three popular FaaS workflow platforms from AWS and Azure, running 25 micro-benchmark and application workflows over 132k invocations. Our detailed analysis confirms some conventional wisdom but also uncovers unique insights on the function execution, workflow orchestration, inter-function interactions, cold-start scaling and monetary costs. Our observations help developers better configure and program these platforms, set performance and scalability expectations, and identify research gaps on enhancing the platforms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an empirical characterization of FaaS workflow platforms (AWS Step Functions, Azure Durable Functions, and related services) by executing 25 micro-benchmark and application workflows across 132k invocations on AWS and Azure. It analyzes function execution times, workflow orchestration overhead, inter-function interactions, cold-start scaling behavior, and monetary costs, claiming to both confirm aspects of conventional wisdom and uncover new platform-specific insights that can guide developer configuration and identify research directions.
Significance. If the selected workflows adequately sample real-world diversity, the work would be significant for the serverless and distributed systems community. It supplies one of the larger public datasets of direct measurements (132k invocations) on production FaaS workflow platforms, offering concrete observations on orchestration latency, cold-start amplification in workflows, and cost trade-offs that are not easily obtained from vendor documentation alone. Such data can directly inform both practitioner tuning and future platform research.
major comments (2)
- [§4 and abstract] §4 (Evaluation Methodology) and the abstract: the central claim that the observations 'help developers better configure and program these platforms' and 'generalize' rests on the representativeness of the 25 workflows. No explicit selection criteria, coverage metrics (e.g., distribution of function counts, dataflow patterns, or invocation rates), or comparison against production traces are provided, making it impossible to assess whether reported behaviors are platform invariants or artifacts of the chosen micro-benchmarks.
- [§5] §5 (Results on cold-start scaling): the reported scaling curves and cost multipliers for workflow-level cold starts are presented without statistical significance tests or confidence intervals across the 132k runs; given the known high variance of cold starts, this weakens the load-bearing claim that the study uncovers 'unique insights' on scaling behavior.
minor comments (2)
- [Figure 3, Table 2] Figure 3 and Table 2: axis labels and legend entries use inconsistent abbreviations (e.g., 'SF' vs 'Step Functions') that are not defined on first use, reducing readability.
- [§6] §6 (Discussion): the limitations paragraph is very brief and does not address potential platform-specific measurement artifacts (e.g., billing granularity differences between AWS and Azure).
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing honest responses based on the current work and indicating revisions where the manuscript can be strengthened without misrepresenting our contributions.
read point-by-point responses
-
Referee: [§4 and abstract] §4 (Evaluation Methodology) and the abstract: the central claim that the observations 'help developers better configure and program these platforms' and 'generalize' rests on the representativeness of the 25 workflows. No explicit selection criteria, coverage metrics (e.g., distribution of function counts, dataflow patterns, or invocation rates), or comparison against production traces are provided, making it impossible to assess whether reported behaviors are platform invariants or artifacts of the chosen micro-benchmarks.
Authors: We agree that adding explicit documentation of workflow selection and coverage would improve transparency and help readers evaluate generalizability. In the revised manuscript, we have expanded §4 with a new subsection and accompanying table that details the selection criteria: the 25 workflows were chosen to systematically cover common dataflow patterns (sequential, parallel, fan-out/fan-in, and hybrid), function counts (2–15), data payload sizes, and invocation rates drawn from vendor examples and prior serverless literature. We now report aggregate coverage metrics such as the distribution of workflow lengths and interaction types. However, direct comparison against production traces is not feasible, as such traces are proprietary and not released by AWS or Azure; our approach instead uses a diverse, publicly reproducible set of micro-benchmarks and application workflows to sample the design space. revision: yes
-
Referee: [§5] §5 (Results on cold-start scaling): the reported scaling curves and cost multipliers for workflow-level cold starts are presented without statistical significance tests or confidence intervals across the 132k runs; given the known high variance of cold starts, this weakens the load-bearing claim that the study uncovers 'unique insights' on scaling behavior.
Authors: We acknowledge that the high variance inherent to cold starts makes statistical presentation important for substantiating claims. In the revised §5, we have added 95% confidence intervals to all scaling curves and cost-multiplier figures, computed across the repeated runs within the 132k invocations. We have also incorporated statistical significance testing (Welch’s t-tests for cross-platform comparisons and ANOVA for scaling trends) to confirm that the reported workflow-level cold-start amplification effects are statistically distinguishable from noise. These additions directly address the variance concern while preserving the original observations. revision: yes
- Direct comparison against proprietary production traces from AWS and Azure, which are not publicly available to researchers.
Circularity Check
No circularity: empirical measurements from benchmarks
full rationale
This paper is a pure empirical characterization study that executes 25 micro-benchmark and application workflows for 132k invocations on AWS and Azure, then reports direct observations on execution, orchestration, cold starts, and costs. No equations, fitted parameters, predictions, or derivations appear in the abstract or described methodology that could reduce to the inputs by construction. The analysis rests on experimental data collection rather than self-referential definitions or self-citation chains, making the derivation chain self-contained against the reported benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected micro-benchmarks and application workflows accurately represent typical FaaS usage scenarios.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We perform extensive evaluations of three popular FaaS workflow platforms... running 25 micro-benchmark and application workflows over 132k invocations. Our detailed analysis confirms some conventional wisdom but also uncovers unique insights on the function execution, workflow orchestration, inter-function interactions, cold-start scaling and monetary costs.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Cost Analysis: FaaS workflow billing is complex with different components. Our models identify orchestration and data transfers as the dominant cost (≈99%) rather than function execution.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Status of serverless computing and function-as-a-service(faas) in industry and research,
G. C. Foxet al., “Status of serverless computing and function-as-a-service(faas) in industry and research,”CoRR, 2017
work page 2017
-
[2]
M. Malawski, A. Gajek, A. Zima, B. Balis, and K. Figiela, “Serverless execution of scientific workflows: Experiments with hyperflow, aws lambda and google cloud functions,”Future Generation Computer Systems, 2020
work page 2020
-
[3]
Daydream: Executing dynamic scientific workflows on serverless platforms with hot starts,
R. B. Roy, T. Patel, and D. Tiwari, “Daydream: Executing dynamic scientific workflows on serverless platforms with hot starts,” inInternational Conference on High Performance Computing, Networking, Storage and Analysis, 2022
work page 2022
-
[4]
A mixed-method empirical study of function-as-a-service software development in industrial practice,
P. Leitner, E. Wittern, J. Spillner, and W. Hummer, “A mixed-method empirical study of function-as-a-service software development in industrial practice,”Journal of Systems and Software, 2019
work page 2019
-
[5]
Scalable and cost-effective serverless architec- ture for information extraction workflows,
D. Chahal, S. C. Palepu, and R. Singhal, “Scalable and cost-effective serverless architec- ture for information extraction workflows,” inWorkshop on High Performance Serverless Computing, 2022
work page 2022
-
[6]
M. Shahradet al., “Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider,” inUSENIX Annual Technical Conference (ATC), 2020
work page 2020
-
[7]
Beyond microbench- marks: The spec-rg vision for a comprehensive serverless benchmark,
E. van Eyk, J. Scheuner, S. Eismann, C. L. Abad, and A. Iosup, “Beyond microbench- marks: The spec-rg vision for a comprehensive serverless benchmark,” inCompanion of the ACM/SPEC International Conference on Performance Engineering, 2020
work page 2020
-
[8]
Benchmarking parallelism in faas platforms,
D. Barcelona-Pons and P. García-López, “Benchmarking parallelism in faas platforms,” Future Generation Computer Systems, 2021
work page 2021
-
[9]
Peeking behind the curtains of serverless platforms,
L. Wang, M. Li, Y. Zhang, T. Ristenpart, and M. Swift, “Peeking behind the curtains of serverless platforms,” inUSENIX Annual Technical Conference (ATC), 2018
work page 2018
-
[10]
Faasdom: A benchmark suite for server- lesscomputing,
P. Maissen, P. Felber, P. Kropf, and V. Schiavoni, “Faasdom: A benchmark suite for server- lesscomputing,” inACM International Conference on Distributed and Event-Based Systems, 2020
work page 2020
-
[11]
Functionbench: A suite of workloads for serverless cloud function service,
J. Kim and K. Lee, “Functionbench: A suite of workloads for serverless cloud function service,” in2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 2019. 27
work page 2019
-
[12]
Sebs: A serverless benchmark suite for function-as-a-service computing,
M. Copiket al., “Sebs: A serverless benchmark suite for function-as-a-service computing,” inInternational Middleware Conference, 2021
work page 2021
-
[13]
Crossfit: Fine-grained benchmarking of serverless application performance across cloud providers,
J. Scheuner, R. Deng, J.-P. Steghöfer, and P. Leitner, “Crossfit: Fine-grained benchmarking of serverless application performance across cloud providers,” inIEEE/ACM International Conference on Utility and Cloud Computing (UCC), 2022
work page 2022
-
[14]
Owl: Performance-aware schedul- ing for resource-efficient function-as-a-service cloud,
H. Tian, S. Li, A. Wang, W. Wang, T. Wu, and H. Yang, “Owl: Performance-aware schedul- ing for resource-efficient function-as-a-service cloud,” inSymposium on Cloud Computing, 2022
work page 2022
-
[15]
Xfbench: A cross-cloud benchmark suite for evaluating faas workflow platforms,
V. Kulkarniet al., “Xfbench: A cross-cloud benchmark suite for evaluating faas workflow platforms,” inIEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2024
work page 2024
-
[16]
Serverless isn’t server-less: Measuring and exploiting resource variability on cloud faas platforms,
S. Ginzburg and M. J. Freedman, “Serverless isn’t server-less: Measuring and exploiting resource variability on cloud faas platforms,” inInternational Workshop on Serverless Com- puting, 2021
work page 2021
-
[17]
Analyzing tail latency in serverless clouds with stellar,
D. Ustiugov, T. Amariucai, and B. Grot, “Analyzing tail latency in serverless clouds with stellar,” inIEEE International Symposium on Workload Characterization (IISWC), 2021
work page 2021
-
[18]
Benchmarking, analysis, and optimization of serverless function snapshots,
D. Ustiugov, P. Petrov, M. Kogias, E. Bugnion, and B. Grot, “Benchmarking, analysis, and optimization of serverless function snapshots,” inACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
work page 2021
-
[19]
Sequoia: Enabling quality-of-service in serverless computing,
A. Tariqet al., “Sequoia: Enabling quality-of-service in serverless computing,” inACM Symposium on Cloud Computing, 2020
work page 2020
-
[20]
Netherite: Efficient execution of serverless workflows,
S. Burckhardtet al., “Netherite: Efficient execution of serverless workflows,”Proc. VLDB Endow., 2022
work page 2022
-
[21]
Xfaas: Cross-platformorchestration of faas workflows on hybrid clouds,
A.Khochare, T.Khare, V.Kulkarni, andY.Simmhan, “Xfaas: Cross-platformorchestration of faas workflows on hybrid clouds,” inIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2023
work page 2023
-
[22]
Analyzing open-source serverless platforms: Characteristics and performance,
J. Liet al., “Analyzing open-source serverless platforms: Characteristics and performance,” CoRR, vol. abs/2106.03601, 2021
-
[23]
Characterizing commodity serverless computing platforms,
J. Wenet al., “Characterizing commodity serverless computing platforms,”Journal of Soft- ware: Evolution and Process, 2023
work page 2023
-
[24]
Comparison of faas orchestration systems,
P. Garcia Lopez, M. Sanchez-Artigas, G. Paris, D. Barcelona Pons, A. Ruiz Ollobarren, and D. Arroyo Pinto, “Comparison of faas orchestration systems,” in2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), 2018
work page 2018
-
[25]
Cross-platform performance eval- uation of stateful serverless workflows,
N. Shahidi, J. R. Gunasekaran, and M. T. Kandemir, “Cross-platform performance eval- uation of stateful serverless workflows,” inIEEE International Symposium on Workload Characterization (IISWC), 2021
work page 2021
-
[26]
Characterizing serverless platforms with serverlessbench,
T. Yu, Q. Liu, D. Du, Y. Xia, B. Zang, Z. Lu, P. Yang, C. Qin, and H. Chen, “Characterizing serverless platforms with serverlessbench,” inACM Symposium on Cloud Computing, 2020
work page 2020
-
[27]
S. Ristov, C. Hollaus, and M. Hautz, “Colder than the warm start and warmer than the cold start! experience the spawn start in faas providers,” 2022. 28
work page 2022
-
[28]
Enhancingperformancemodelingofserverlessfunctions via static analysis,
R.Wang, G.Casale, andA.Filieri, “Enhancingperformancemodelingofserverlessfunctions via static analysis,” inInternational Conference on Service-Oriented Computing (ICSOC), 2022
work page 2022
-
[29]
An empirical study on serverless workflow service,
J. Wen and Y. Liu, “An empirical study on serverless workflow service,”arXiv, no. 2101.03513, 2021
-
[30]
Modeling and optimization of performance and cost of serverless applications,
C. Lin and H. Khazaei, “Modeling and optimization of performance and cost of serverless applications,”IEEE Transactions on Parallel and Distributed Systems, 2021
work page 2021
-
[31]
A measurement study on serverless workflow services,
J. Wen and Y. Liu, “A measurement study on serverless workflow services,” in2021 IEEE International Conference on Web Services (ICWS), 2021
work page 2021
-
[32]
Predicting the costs of serverless workflows,
S. Eismann, J. Grohmann, E. van Eyk, N. Herbst, and S. Kounev, “Predicting the costs of serverless workflows,” inACM/SPEC International Conference on Performance Engineer- ing, 2020
work page 2020
-
[33]
Wisefuse: Workload characterization and dag transformation for server- less workflows,
A. Mahgoubet al., “Wisefuse: Workload characterization and dag transformation for server- less workflows,”Proc. ACM Meas. Anal. Comput. Syst., 2022
work page 2022
-
[34]
Function-as-a-service performance evaluation: A multivocal literature review,
J. Scheuner and P. Leitner, “Function-as-a-service performance evaluation: A multivocal literature review,”Journal of Systems and Software, 2020
work page 2020
-
[35]
Be- faas: An application-centric benchmarking framework for faas environments,
M. Grambow, T. Pfandzelter, L. Burchard, C. Schubert, M. Zhao, and D. Bermbach, “Be- faas: An application-centric benchmarking framework for faas environments,” inIEEE International Conference on Cloud Engineering (IC2E 2021), 2021
work page 2021
-
[36]
Ganet al., “An open-source benchmark suite for microservices and their hardware-software implicationsforcloud&edgesystems,” inInternational Conference on Architectural Support for Programming Languages and Operating Systems, 2019
work page 2019
-
[37]
Z. Liet al., “Help rather than recycle: Alleviating cold startup in serverless comput- ing through Inter-Function container sharing,” inUSENIX Annual Technical Conference (ATC), 2022
work page 2022
-
[38]
Prebaking functions to warm the serverless cold start,
P. Silva, D. Fireman, and T. E. Pereira, “Prebaking functions to warm the serverless cold start,” inInternational Middleware Conference, 2020
work page 2020
-
[39]
Acpm: adaptive container provisioning model to mitigate server- less cold-start,
A. Kumari and B. Sahoo, “Acpm: adaptive container provisioning model to mitigate server- less cold-start,”Cluster Computing, 2023
work page 2023
-
[40]
Lass: Running latency sensitive serverless compu- tations at the edge,
B. Wang, A. Ali-Eldin, and P. Shenoy, “Lass: Running latency sensitive serverless compu- tations at the edge,” inInternational Symposium on High-Performance Parallel and Dis- tributed Computing, 2021
work page 2021
-
[41]
Faastlane: Accelerating Function-as-a- Service workflows,
S. Kotni, A. Nayak, V. Ganapathy, and A. Basu, “Faastlane: Accelerating Function-as-a- Service workflows,” inUSENIX Annual Technical Conference (ATC), 2021
work page 2021
-
[42]
Cloudburst: stateful functions-as-a-service,
V. Sreekanti, C. Wu, X. C. Lin, J. Schleier-Smith, J. E. Gonzalez, J. M. Hellerstein, and A. Tumanov, “Cloudburst: stateful functions-as-a-service,”Proc. VLDB Endow., 2020
work page 2020
-
[43]
Z. Jia and E. Witchel, “Nightcore: efficient and scalable serverless computing for latency- sensitive, interactivemicroservices,” inACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, 2021
work page 2021
-
[44]
Netherite: efficientexecutionofserverlessworkflows,
S. Burckhardt, B. Chandramouli, C. Gillum, D. Justo, K. Kallas, C. McMahon, C. S. Meiklejohn, andX.Zhu, “Netherite: efficientexecutionofserverlessworkflows,”Proc. VLDB Endow., 2022. 29
work page 2022
-
[45]
On-demand container loading in AWS lambda,
M. Brookeret al., “On-demand container loading in AWS lambda,” in2023 USENIX Annual Technical Conference (USENIX ATC 23), 2023
work page 2023
-
[46]
Characterizing microservice dependency and performance: Alibaba trace analysis,
S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y. Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inACM Symposium on Cloud Computing, 2021
work page 2021
-
[47]
CARL: cost-optimized online container placement on vms using adversarial reinforcement learning,
P. S. Vinayak, S. S. Mallick, L. Jagarlamudi, A. Chakraborty, and Y. Simmhan, “CARL: cost-optimized online container placement on vms using adversarial reinforcement learning,” IEEE Trans. Cloud Comput., vol. 13, no. 1, pp. 321–335, 2025
work page 2025
-
[48]
Riotbench: An iot benchmark for distributed stream processing systems,
A. Shukla, S. Chaturvedi, and Y. Simmhan, “Riotbench: An iot benchmark for distributed stream processing systems,”Concurrency and Computation: Practice and Experience, 2017
work page 2017
-
[49]
Decomposing workload bursts for efficient storage resource management,
L. Lu, P. J. Varman, and K. Doshi, “Decomposing workload bursts for efficient storage resource management,”IEEE Transactions on Parallel and Distributed Systems, 2011
work page 2011
-
[50]
Firecracker: Lightweight virtualization for serverless applications,
A. Agache, M. Brooker, A. Florescu, A. Iordache, A. Liguori, R. Neugebauer, P. Piwonka, and D.-M. Popa, “Firecracker: Lightweight virtualization for serverless applications,” in NSDI, 2020
work page 2020
-
[51]
A. Wanget al., “Faasnet: Scalable and fast provisioning of custom serverless container runtimes at alibaba cloud function compute,” inUSENIX Annual Technical Conference (ATC), 2021. 30 A Appendix A.1 Scaling with Input Rates Fig. 18 reports the performance of Graph workflow with an increase in the maximum partition count (PC) on AzN, and complements Fig. ...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.