Adaptive Management of Microservices in Dynamic Computing Environments: A Taxonomy and Future Directions
Pith reviewed 2026-05-07 15:22 UTC · model grok-4.3
The pith
Microservice adaptation systems typically model only part of real production dynamics, and their reported gains track evaluation fidelity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A taxonomy organized around control locus, modeled dynamics, adaptation strategy, and evaluation evidence shows that most of the 84 examined microservice systems only partially represent production dynamics such as workload variation, request-path changes, interference, and failures. Reported performance improvements also vary with how faithfully the evaluation environments reproduce those dynamics.
What carries the argument
Four-dimensional taxonomy of control locus, modeled dynamics, adaptation strategy, and evaluation evidence, applied to 84 systems and 13 evaluation artifacts.
If this is right
- Cross-layer coordination between autoscaling, placement, routing, and remediation will be required for robust adaptation.
- Abstractions that connect telemetry directly to control actions can reduce the gap between monitoring and response.
- Learning-based controllers must add safety constraints to avoid harmful adaptations during exploration.
- Evaluation frameworks need to become reproducible and dynamic so that claims can be compared fairly.
Where Pith is reading between the lines
- Current reported gains may shrink once systems face the full combination of dynamics seen in live cloud environments.
- The taxonomy could serve as a checklist for designers of future orchestration platforms to avoid common oversights.
- Adding explicit failure-mode coverage might cut outage frequency more than scaling alone.
Load-bearing premise
The 84 chosen systems and 13 evaluation artifacts stand in for the full range of existing work, and the four taxonomy dimensions together cover the important ways adaptive management can be organized.
What would settle it
A follow-up survey that locates many additional production microservice systems modeling all major dynamics simultaneously and showing performance gains that remain stable across low- and high-fidelity evaluations.
Figures
read the original abstract
Microservice-based cloud applications face changing workloads, evolving request paths, variable network conditions, interference, and failures. These dynamics couple autoscaling, placement, routing, isolation, and remediation. The survey examines dynamics-aware adaptive management for microservices. Its taxonomy covers control locus, modeled dynamics, adaptation strategy, and evaluation evidence; objectives and telemetry are cross-cutting. A synthesis of 84 system entries and 13 evaluation artifacts shows that production dynamics are often partially modeled. Reported gains also depend on evaluation fidelity. Key future directions include cross-layer coordination, telemetry-to-control abstractions, safe learning-based control, and reproducible dynamic evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys adaptive management of microservices facing dynamic conditions including workload changes, network variability, interference, and failures. It proposes a taxonomy with four primary dimensions (control locus, modeled dynamics, adaptation strategy, evaluation evidence) plus cross-cutting objectives and telemetry. A synthesis of 84 systems and 13 evaluation artifacts leads to the observations that production dynamics are typically modeled only partially and that reported gains are sensitive to evaluation fidelity. The work identifies future directions such as cross-layer coordination, telemetry-to-control abstractions, safe learning-based control, and reproducible dynamic evaluation.
Significance. If the synthesis holds, the taxonomy supplies a practical organizing framework for a rapidly evolving area of cloud and distributed systems research. The concrete counts (84 systems, 13 artifacts) and the emphasis on partial dynamic modeling plus evaluation fidelity provide actionable insights that can steer both academic work and industrial practice toward more realistic adaptation mechanisms. The call for reproducible dynamic evaluation is a constructive contribution that addresses a known weakness in the broader literature.
major comments (1)
- [Methodology / Literature Selection] The methodology section does not specify the literature search strategy, databases, keywords, or inclusion/exclusion criteria used to arrive at the 84 system entries and 13 evaluation artifacts. Without these details the representativeness of the synthesis and the risk of selection bias cannot be assessed, which directly affects the reliability of the central claims about partial modeling of dynamics and dependence on evaluation fidelity.
minor comments (2)
- The abstract and introduction could more explicitly state the search period or publication venues covered to help readers gauge temporal coverage.
- [Synthesis] A summary table or figure that cross-tabulates the 84 systems against the four taxonomy dimensions would improve readability and allow quicker verification of the reported patterns.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the work and the constructive comment on methodology transparency. We address the point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Methodology / Literature Selection] The methodology section does not specify the literature search strategy, databases, keywords, or inclusion/exclusion criteria used to arrive at the 84 system entries and 13 evaluation artifacts. Without these details the representativeness of the synthesis and the risk of selection bias cannot be assessed, which directly affects the reliability of the central claims about partial modeling of dynamics and dependence on evaluation fidelity.
Authors: We agree that explicit documentation of the literature selection process is necessary to allow readers to evaluate representativeness and potential selection bias. In the revised manuscript we will add a dedicated subsection to the Methodology section that details: the databases and sources searched (ACM Digital Library, IEEE Xplore, Google Scholar, arXiv, and selected conference proceedings); the keyword combinations and Boolean queries used (e.g., “microservices” AND (“adaptive management” OR “dynamic adaptation” OR “autoscaling” OR “workload variability” OR “network variability”)); the time window (2015–2024); inclusion criteria (peer-reviewed papers presenting implemented adaptive systems for microservices that explicitly address at least one form of runtime dynamics); and exclusion criteria (non-English works, purely theoretical papers without implementation or evaluation, prior surveys, and papers focused solely on static environments). We will also describe the multi-stage screening process (title/abstract screening followed by full-text review) that produced the final counts of 84 systems and 13 evaluation artifacts. These additions will directly support the reliability of the central observations on partial dynamic modeling and evaluation fidelity. revision: yes
Circularity Check
No significant circularity; survey is observational
full rationale
This is a literature survey paper whose core contribution is a taxonomy (control locus, modeled dynamics, adaptation strategy, evaluation evidence) applied to 84 external systems and 13 evaluation artifacts. No equations, derivations, fitted parameters, predictions, or uniqueness theorems appear in the abstract or described structure. All reported patterns and future directions are direct summaries of cited external work rather than internally generated quantities that reduce to the paper's own inputs. Self-citations, if present, are not load-bearing for any deductive claim. The synthesis is therefore self-contained against external benchmarks with no circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Al Qassem, Thanos Stouraitis, Ernesto Damiani, and Ibrahim M
Lamees M. Al Qassem, Thanos Stouraitis, Ernesto Damiani, and Ibrahim M. Elfadel. 2024. Containerized Microservices: A Survey of Resource Management Frameworks.IEEE Transactions on Network and Service Management21, 4 (2024), 3775–3796. doi:10.1109/TNSM.2024.3388633 Manuscript submitted to ACM 26 Chen, Islam, Read, and Buyya
-
[2]
Alibaba. 2022. Alibaba microservice distributed traces. https://github.com/alibaba/clusterdata/tree/master/cluster-trace-microservices-v2022. Accessed: 2026-04-10
2022
-
[3]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: minimal near-optimal datacenter transport. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM(Hong Kong, China)(SIGCOMM ’13). Association for Computing Machinery, New York, NY, USA, 435–446. doi:10.1145/2486001.2486031
-
[4]
Peter Alvaro, Kolton Andrus, Chris Sanden, Casey Rosenthal, Ali Basiri, and Lorin Hochstein. 2016. Automating Failure Testing Research at Internet Scale. InProceedings of the Seventh ACM Symposium on Cloud Computing(Santa Clara, CA, USA)(SoCC ’16). Association for Computing Machinery, New York, NY, USA, 17–28. doi:10.1145/2987550.2987555
-
[5]
Apache Software Foundation. [n. d.]. Apache JMeter. https://jmeter.apache.org/. Accessed: 2026-04-10
2026
-
[6]
Kubernetes Autoscaler. 2026. Cluster Autoscaler. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler. Accessed: 2026-04-10
2026
-
[7]
Alkiviadis Aznavouridis, Konstantinos Tsakos, and Euripides GM Petrakis. 2022. Micro-service placement policies for cost optimization in Kubernetes. InInternational Conference on Advanced Information Networking and Applications. Springer, Springer, 409–420
2022
-
[8]
Ataollah Fatahi Baarzi and George Kesidis. 2021. SHOWAR: Right-Sizing And Efficient Scheduling of Microservices. InProceedings of the ACM Symposium on Cloud Computing(Seattle, WA, USA)(SoCC ’21). Association for Computing Machinery, New York, NY, USA, 427–441. doi:10.1145/3472883.3486999
-
[9]
Yixin Bao, Yanghua Peng, Chuan Wu, and Zongpeng Li. 2018. Online Job Scheduling in Distributed Machine Learning Clusters. InIEEE INFOCOM 2018 - IEEE Conference on Computer Communications(Honolulu, HI, USA). IEEE Press, 495–503. doi:10.1109/INFOCOM.2018.8486422
-
[10]
Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering.IEEE Software33, 3 (2016), 35–41. doi:10.1109/MS.2016.60
-
[11]
Ranjita Bhagwan, Rahul Kumar, Chandra Sekhar Maddila, and Adithya Abraham Philip. 2018. Orca: Differential Bug Localization in Large-Scale Services. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 493–509
2018
-
[12]
Jing Bi, Libo Zhang, Haitao Yuan, and MengChu Zhou. 2018. Hybrid task prediction based on wavelet decomposition and ARIMA model in cloud data center. In2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). 1–6. doi:10.1109/ICNSC.2018.8361342
-
[13]
Zhengda Bian, Shenggui Li, Wei Wang, and Yang You. 2021. Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(St. Louis, Missouri)(SC ’21). ACM, New York, NY, USA, Article 100, 15 pages. doi:10.1145...
-
[14]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (April 2016), 50–57. doi:10.1145/2890784
-
[15]
Calheiros, Enayat Masoumi, Rajiv Ranjan, and Rajkumar Buyya
Rodrigo N. Calheiros, Enayat Masoumi, Rajiv Ranjan, and Rajkumar Buyya. 2015. Workload Prediction Using ARIMA Model and Its Impact on Cloud Applications’ QoS.IEEE Transactions on Cloud Computing3, 4 (2015), 449–458. doi:10.1109/TCC.2014.2350475
-
[16]
Calheiros, Rajiv Ranjan, Anton Beloglazov, César A
Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, and Rajkumar Buyya. 2011. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms.Softw. Pract. Exper.41, 1 (Jan. 2011), 23–50. doi:10.1002/spe.995
-
[17]
Lianjie Cao and Puneet Sharma. 2021. Co-locating Containerized Workloads Using Service Mesh Telemetry. InProceedings of the 17th ACM International Conference on Emerging Networking Experiments and Technologies (CoNEXT). 168–181. doi:10.1145/3485983.3494867
-
[18]
Carmen Carrión. 2022. Kubernetes Scheduling: Taxonomy, Ongoing Issues and Challenges.ACM Comput. Surv.55, 7, Article 138 (Dec. 2022), 37 pages. doi:10.1145/3539606
-
[19]
Tomas Cerny, Michael J. Donahoo, and Michal Trnka. 2018. Contextual Understanding of Microservice Architecture: Current and Future Directions. SIGAPP Applied Computing Review17, 4 (2018), 29–45. doi:10.1145/3183628.3183631
-
[20]
Liao Chen, Shutian Luo, Chenyu Lin, Zizhao Mo, Huanle Xu, Kejiang Ye, and Chengzhong Xu. 2024. Derm: SLA-aware Resource Management for Highly Dynamic Microservices. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 424–436. doi:10.1109/ ISCA59077.2024.00039
- [21]
-
[22]
Ming Chen, Muhammed Tawfiqul Islam, Maria Rodriguez Read, and Rajkumar Buyya. 2026. TraDE: Network and Traffic-Aware Adaptive Scheduling for Microservices Under Dynamics .IEEE Transactions on Parallel & Distributed Systems37, 01 (Jan. 2026), 76–89. doi:10.1109/TPDS.2025.3626424
-
[23]
Ming Chen, Maria Rodriguez Read, Patricia Arroba, and Rajkumar Buyya. 2023. EN-Beats: A Novel Ensemble Learning-Based Method for Multiple Resource Predictions in Cloud. In2023 IEEE 16th International Conference on Cloud Computing (CLOUD). 144–154. doi:10.1109/CLOUD60044.2023. 00025
-
[24]
Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems(Providence, RI, USA)(ASPLOS ’19). ACM, New York, NY, USA, 107–120. doi:10.1145/32978...
-
[25]
Shuang Chen, Yi Jiang, Christina Delimitrou, and José F. Martínez. 2022. PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory. In2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 1086–1099. doi:10.1109/HPCA53966.2022.00083 Manuscript submitted to ACM Adaptive Microse...
-
[26]
Yitian Chen, Yanfei Kang, Yixiong Chen, and Zizhuo Wang. 2020. Probabilistic forecasting with temporal convolutional neural network.Neurocom- puting399 (2020), 491–501. doi:10.1016/j.neucom.2020.03.011
-
[27]
Arora, Yu Deng, Saurabh Jha, and Tianyin Xu
Yinfang Chen, Jiaqi Pan, Jackson Clark, Yiming Su, Noah Zheutlin, Bhavya Bhavya, Rohan R. Arora, Yu Deng, Saurabh Jha, and Tianyin Xu. 2025. STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS’25). https://openreview.net/forum?id=fYW1PKawwJ
2025
-
[28]
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, and Saravan Rajmohan. 2025. AIOpsLab: A Holistic Framework for Evaluating AI Agents for Enabling Autonomous Cloud. InMLSys ’25. https://www.microsoft. com/en-us/research/publication/aiopslab-a-holistic-framework-for-evaluating-ai-agents-for...
2025
-
[29]
Dazhao Cheng, Xiaobo Zhou, Zhijun Ding, Yu Wang, and Mike Ji. 2019. Heterogeneity Aware Workload Management in Distributed Sustainable Datacenters.IEEE Transactions on Parallel and Distributed Systems (TPDS)30, 2 (Feb 2019), 375–387. doi:10.1109/TPDS.2018.2865927
-
[30]
Byungkwon Choi, Jinwoo Park, Chunghan Lee, and Dongsu Han. 2021. PHPA: A Proactive Autoscaling Framework for Microservice Chain. In Proceedings of the 5th Asia-Pacific Workshop on Networking. ACM, 65–71. doi:10.1145/3469393.3469401
-
[31]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. InProceedings of the 26th Symposium on Operating Systems Principles(Shanghai, China)(SOSP ’17). ACM, New York, NY, USA, 153–167. doi:10...
-
[32]
Mario De Jesus, Perfect Sylvester, William Clifford, Aaron Perez, and Palden Lama. 2025. LLM-Based Multi-Agent Framework for Troubleshooting Distributed Systems. InProceedings of the 2025 IEEE Cloud Summit. IEEE, 110–115. doi:10.1109/CLOUD-SUMMIT64795.2025.00024
-
[33]
Dean, Hiep Nguyen, Xiaohui Gu, Hui Zhang, Junghwan Rhee, Nipun Arora, and Geoff Jiang
Daniel J. Dean, Hiep Nguyen, Xiaohui Gu, Hui Zhang, Junghwan Rhee, Nipun Arora, and Geoff Jiang. 2014. PerfScope: Practical Online Server Performance Bug Inference in Production Cloud Computing Infrastructures. InProceedings of the ACM Symposium on Cloud Computing(Seattle, WA, USA)(SOCC ’14). ACM, New York, NY, USA, 1–13. doi:10.1145/2670979.2670987
-
[34]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale.Commun. ACM56, 2 (Feb. 2013), 74–80. doi:10.1145/2408776.2408794
-
[35]
Christina Delimitrou. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. InProceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems(Houston, Texas, USA)(ASPLOS ’13). ACM, New York, NY, USA, 77–88. doi:10.1145/2451116.2451125
-
[36]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. InProceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems(Salt Lake City, Utah, USA)(ASPLOS ’14). ACM, New York, NY, USA, 127–144. doi:10.1145/2541940.2541941
-
[37]
Andrea Detti, Ludovico Funari, and Luca Petrucci. 2023.µBench: An Open-Source Factory of Benchmark Microservice Applications.IEEE Transactions on Parallel and Distributed Systems34, 3 (2023), 968–980. doi:10.1109/TPDS.2023.3236447
-
[38]
Zhijun Ding, Song Wang, and Changjun Jiang. 2023. Kubernetes-Oriented Microservice Placement With Dynamic Resource Allocation .IEEE Transactions on Cloud Computing11, 02 (April 2023), 1777–1793. doi:10.1109/TCC.2022.3161900
-
[39]
Nicola Dragoni, Saverio Giallorenzo, Alberto Lluch Lafuente, Manuel Mazzara, Fabrizio Montesi, Ruslan Mustafin, and Larisa Safina. 2017. Microservices: Yesterday, Today, and Tomorrow. InPresent and Ulterior Software Engineering, Manuel Mazzara and Bertrand Meyer (Eds.). Springer, 195–216. doi:10.1007/978-3-319-67425-4_12
-
[40]
Fanrong Du, Jiuchen Shi, Quan Chen, Pu Pang, Li Li, and Minyi Guo. 2025. Generating Microservice Graphs with Production Characteristics for Efficient Resource Scaling. InProceedings of the 39th ACM International Conference on Supercomputing (ICS ’25). Association for Computing Machinery, New York, NY, USA, 895–910. doi:10.1145/3721145.3725761
-
[41]
Raphael Eidenbenz, Yvonne-Anne Pignolet, and Alain Ryser. 2020. Latency-Aware Industrial Fog Application Orchestration with Kubernetes. In 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC). 164–171. doi:10.1109/FMEC49853.2020.9144934
-
[42]
Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein
Danielle E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: a fast and reliable software network load balancer. InProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation(Santa Clara, CA)(NSDI’16). USE...
2016
-
[43]
Simon Eismann, Joel Scheuner, Erwin Van Eyk, Maximilian Schwinger, Johannes Grohmann, Nikolas Herbst, Cristina L Abad, and Alexandru Iosup
- [44]
-
[45]
Envoy: An Open Source Edge and Service Proxy, Designed for Cloud Native Apps. 2026. https://www.envoyproxy.io/. Accessed: 2026-04-10
2026
-
[46]
Katz, and Scott Shenker
Rodrigo Fonseca, George Porter, Randy H. Katz, and Scott Shenker. 2007. X-Trace: A Pervasive Network Tracing Framework. In4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 07). USENIX Association, Cambridge, MA. https://www.usenix.org/conference/ nsdi-07/x-trace-pervasive-network-tracing-framework
2007
-
[47]
Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, and Minyi Guo. 2021. Adaptive resource efficient microservice deployment in cloud-edge continuum. IEEE Transactions on Parallel and Distributed Systems33, 8 (2021), 1825–1840
2021
-
[48]
Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, Xin Peng, Wenli Zheng, and Minyi Guo. 2021. QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge Continuum. InProceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 932–941. doi:10.1109/IPDPS49936.2021.00102
-
[49]
Nan Fu, Guang Cheng, Yue Teng, Guangye Dai, Shui Yu, and Zihan Chen. 2025. Intelligent Root Cause Localization in MicroService Systems: A Survey and New Perspectives.ACM Comput. Surv.57, 12, Article 325 (July 2025), 37 pages. doi:10.1145/3736755 Manuscript submitted to ACM 28 Chen, Islam, Read, and Buyya
-
[50]
Yu Gan, Mingyu Liang, Sundar Dev, David Lo, and Christina Delimitrou. 2021. Sage: practical and scalable ML-driven performance debugging in microservices. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA)(ASPLOS ’21). ACM, New York, NY, USA, 135–151. doi:10.1145/3...
-
[51]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-S...
-
[52]
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices. InProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems(Providence, RI, US...
-
[53]
Mohammad Goudarzi, Marimuthu Palaniswami, and Rajkumar Buyya. 2022. Scheduling IoT Applications in Edge and Fog Computing Environments: A Taxonomy and Future Directions.ACM Comput. Surv.55, 7, Article 152 (Dec. 2022), 41 pages. doi:10.1145/3544836
-
[54]
Paulo Gouveia, João Neves, Carlos Segarra, Luca Liechti, Shady Issa, Valerio Schiavoni, and Miguel Matos. 2020. Kollaps: Decentralized and Dynamic Topology Emulation. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece)(EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 23, 16 pages. doi:10.1...
-
[55]
Lin Gu, Deze Zeng, Jie Hu, Hai Jin, Song Guo, and Albert Y. Zomaya. 2021. Exploring Layered Container Structure for Cost Efficient Microservice Deployment. InProceedings of the IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. 1–9. doi:10.1109/INFOCOM42981.2021. 9488918
-
[56]
Shilin He, Botao Feng, Liqun Li, Xu Zhang, Yu Kang, Qingwei Lin, Saravan Rajmohan, and Dongmei Zhang. 2023. STEAM: Observability- Preserving Trace Sampling. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing ...
-
[57]
Joseph, Randy Katz, Scott Shenker, and Ion Stoica
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: a platform for fine-grained resource sharing in the data center. InProceedings of the 8th USENIX Conference on Networked Systems Design and Implementation(Boston, MA)(NSDI’11). USENIX Association, USA, 295–308
2011
-
[58]
Chi-Yao Hong, Matthew Caesar, and P. Brighten Godfrey. 2012. Finishing Flows Quickly with Preemptive Scheduling. InProceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication(Helsinki, Finland)(SIGCOMM ’12). ACM, New York, NY, USA, 127–138. doi:10.1145/2342356.2342389
-
[59]
Yi Hu, Haonan Ding, Haoxuan Chen, Jianwen He, Menglan Hu, Chao Cai, and Kai Peng. 2025. Collaborative Orchestration with Probabilistic Routing for Dynamic Service Mesh in Clouds. InIEEE INFOCOM 2025-IEEE Conference on Computer Communications. IEEE, 1–10
2025
-
[60]
Yi Hu, Hao Wang, Liangyuan Wang, Menglan Hu, Kai Peng, and Bharadwaj Veeravalli. 2023. Joint Deployment and Request Routing for Microservice Call Graphs in Data Centers.IEEE Transactions on Parallel and Distributed Systems34, 11 (2023), 2994–3011. doi:10.1109/TPDS.2023.3311767
-
[61]
Lexiang Huang and Timothy Zhu. 2021. tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces. InProceedings of the ACM Symposium on Cloud Computing(Seattle, WA, USA)(SoCC ’21). Association for Computing Machinery, New York, NY, USA, 76–91. doi:10.1145/3472883.3486994
-
[62]
Sambasivan
Darby Huye, Yuri Shkuro, and Raja R. Sambasivan. 2023. Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows. In2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 419–432
2023
-
[63]
Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. InProceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 519–532
2018
-
[64]
Istio: An open source service mesh. 2026. https://istio.io/. Accessed: 2026-04-10
2026
-
[65]
Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Albert Greenberg, and Changhoon Kim. 2013. EyeQ: Practical Network Performance Isolation at the Edge. In10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX Association, Lombard, IL, 297–311
2013
-
[66]
Varshney, Ruchi Mahindru, Anca Sailer, Laura Shwartz, Daby Sow, Nicholas C
Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Kitahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir O Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatterjee, Pranjal Gupta, Suranjana Samanta, Pooja Aggarwal, Rong Lee, Jae-woo...
2025
-
[67]
Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks. InProceedings of the Fourteenth EuroSys Conference 2019(Dresden, Germany)(EuroSys ’19). ACM, New York, NY, USA, Article 34, 16 pages. doi:10.1145/3302424.3303958 Manuscript s...
-
[68]
Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga. 2015. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters. In2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 485–497
2015
-
[69]
Karpenter. 2026. Karpenter. https://karpenter.sh/. Accessed: 2026-04-10
2026
-
[70]
Bartolini, Nathan Beckmann, and Daniel Sanchez
Harshad Kasture, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast analytical power management for latency-critical systems. In2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 598–610. doi:10.1145/2830772.2830797
-
[71]
KEDA. 2026. KEDA: Kubernetes Event-driven Autoscaling. https://keda.sh/. Accessed: 2026-04-10
2026
-
[72]
Jeffrey O. Kephart and David M. Chess. 2003. The Vision of Autonomic Computing.Computer36, 1 (2003), 41–50. doi:10.1109/MC.2003.1160055
-
[73]
Tahseen Khan, Wenhong Tian, Guangyao Zhou, Shashikant Ilager, Mingming Gong, and Rajkumar Buyya. 2022. Machine learning (ML)-centric resource management in cloud computing: A review and future directions.Journal of Network and Computer Applications204 (2022), 103405. doi:10.1016/j.jnca.2022.103405
-
[74]
In Kee Kim, Wei Wang, Yanjun Qi, and Marty Humphrey. 2022. Forecasting Cloud Application Workloads With CloudInsight for Predictive Resource Management.IEEE Transactions on Cloud Computing10, 3 (2022), 1848–1863. doi:10.1109/TCC.2020.2998017
-
[75]
Knative. 2026. Knative. https://knative.dev/. Accessed: 2026-04-10
2026
-
[76]
Kubernetes Authors. 2026. Kubernetes Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/. Accessed: 2026-04-10
2026
-
[77]
Kubernetes Authors. 2026. Kubernetes Vertical Pod Autoscaling. https://kubernetes.io/docs/concepts/workloads/autoscaling/vertical-pod-autoscale/. Accessed: 2026-04-10
2026
-
[78]
Jitendra Kumar, Rimsha Goomer, and Ashutosh Kumar Singh. 2018. Long Short Term Memory Recurrent Neural Network (LSTM-RNN) Based Workload Forecasting Model For Cloud Datacenters.Procedia Computer Science125 (2018), 676–682. doi:10.1016/j.procs.2017.12.087 The 6th International Conference on Smart Computing and Communications
-
[79]
Jitendra Kumar and Ashutosh Kumar Singh. 2018. Workload prediction in cloud using artificial neural network and adaptive differential evolution. Future Generation Computer Systems81 (2018), 41–52. doi:10.1016/j.future.2017.10.047
-
[80]
Grafana Labs. 2026. Grafana. https://grafana.com/oss/grafana/. Accessed: 2026-04-10
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.