EventADL: Open-Box Anomaly Detection and Localization Framework for Events in Cloud-Based Service Systems
Pith reviewed 2026-05-09 19:30 UTC · model grok-4.3
The pith
Event data from cloud systems reveals anomalies and their root causes by learning normal interaction patterns and their frequencies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Normal behavior is captured first as Event Semantic Patterns that record expected interactions between system entities and then as Event Frequency Patterns that record how often those interactions occur. Significant departures from either pattern in a live event stream mark anomalies. Root cause localization follows by constructing an Intervention Graph that links recent interactions to the anomaly and identifies the most likely origin.
What carries the argument
Event Semantic Patterns and Event Frequency Patterns, which encode normal entity interactions and their rates, together with the Intervention Graph that relates recent interactions to flagged anomalies for cause identification.
If this is right
- Event streams alone become sufficient for real-time anomaly detection in cloud environments.
- Root causes become traceable automatically through the relationships modeled in the intervention graph.
- The process works with unlabeled historical data and yields interpretable results for operators.
- The same learned patterns support ongoing monitoring across multiple production cloud services.
Where Pith is reading between the lines
- The pattern-based approach might extend to other distributed systems where event logs are the primary observable data.
- Hybrid systems could combine these event patterns with metric or log signals to catch anomalies that appear in only one data type.
- One could test whether the learned patterns remain effective after software updates or hardware changes within the same service.
Load-bearing premise
Deviations from the learned semantic and frequency patterns will mark genuine anomalies and the intervention graph will recover the correct root cause from the surrounding events.
What would settle it
A real cloud incident in which the event stream matches the learned patterns yet a failure occurs, or in which the intervention graph selects an incorrect component as the origin.
Figures
read the original abstract
Anomaly detection and localization (ADL) is critical for maintaining reliability and availability in cloud systems. Recent ADL developments focus on metric and log data, leaving event data unexplored. To address this gap, we propose EventADL, the first open-box event-based ADL framework for cloud-based service systems. To motivate the design of our framework, we conduct a systematic analysis on 520 real-world incidents, and provide insights into how anomalies and their root causes manifest through event data. EventADL has three phases: offline training, online anomaly detection, and root cause localization. During the training phase, EventADL first learns Event Semantic Patterns (ESPs), which capture normal interactions between system entities using historical event data, and then learns Event Frequency Patterns (EFPs), which capture the normal frequency of known ESPs. In the online anomaly detection phase, any data in the event stream that deviates significantly from either pattern is identified as anomalous. For localization, EventADL constructs an Intervention Graph that models the relationships between recent system interactions and the detected anomalies for automatic root cause localization. The framework is designed to operate efficiently with unlabeled data and to produce interpretable anomalies with their corresponding root causes. Our evaluation on three real cloud service systems and two real-world incidents demonstrates that EventADL outperforms existing methods, achieving F1-scores of at least 90% for anomaly detection and 100% top-3 accuracy in root cause localization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents EventADL, an open-box framework for anomaly detection and localization (ADL) in cloud-based service systems using event data. Motivated by a systematic analysis of 520 real-world incidents, the framework learns Event Semantic Patterns (ESPs) capturing normal entity interactions and Event Frequency Patterns (EFPs) for normal frequencies during offline training. Online, it detects significant deviations from these patterns as anomalies. For localization, it constructs an Intervention Graph modeling relationships between recent interactions and detected anomalies. Evaluation on three real cloud systems and two incidents reports F1-scores of at least 90% for detection and 100% top-3 accuracy for root cause localization, outperforming existing methods.
Significance. If the claims hold, the work is significant for filling the gap in event-based ADL for cloud systems, offering an interpretable, open-box alternative to black-box approaches. The use of real incident data for motivation and evaluation on actual systems strengthens its practical relevance. The framework's design for unlabeled data and efficiency is a positive aspect. However, the limited scope of the localization evaluation tempers the overall impact.
major comments (3)
- [Evaluation section (and abstract)] Evaluation section (and abstract): The root cause localization claim of 100% top-3 accuracy is supported by results from only two real-world incidents. With no reported statistical significance, variance, cross-validation, incident selection criteria, or failure mode analysis, this small sample size does not provide load-bearing evidence for the effectiveness or generalizability of the Intervention Graph approach.
- [§3 (Framework description)] §3 (Framework description): The paper lacks sufficient details on the algorithms, parameters, or methods used to learn ESPs and EFPs from historical data, as well as the specific deviation thresholds or statistical tests for identifying anomalies in the online phase. This makes the training procedure and the reported F1-scores difficult to reproduce or assess for robustness.
- [Abstract and evaluation] Abstract and evaluation: The claim that EventADL 'outperforms existing methods' does not specify the baseline methods, how they were adapted to event data, or include quantitative comparison details (e.g., tables with metrics). This weakens the ability to evaluate the superiority asserted for both detection and localization.
minor comments (2)
- [Abstract] The abstract could briefly name the compared methods or key technical innovations to better frame the performance numbers.
- [Throughout] Ensure consistent definition of acronyms (e.g., ESP, EFP) on first use in the main body and consider adding a limitations section discussing scalability with high-volume event streams.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, with proposed revisions to improve reproducibility, clarity, and transparency of our evaluation.
read point-by-point responses
-
Referee: Evaluation section (and abstract): The root cause localization claim of 100% top-3 accuracy is supported by results from only two real-world incidents. With no reported statistical significance, variance, cross-validation, incident selection criteria, or failure mode analysis, this small sample size does not provide load-bearing evidence for the effectiveness or generalizability of the Intervention Graph approach.
Authors: We acknowledge that the root cause localization results are based on only two real-world incidents, which limits the strength of generalizability claims. These incidents were selected as representative examples from the 520 analyzed incidents to illustrate the Intervention Graph in practice. In the revised manuscript, we will add the incident selection criteria, describe the key characteristics of each incident, include a failure mode discussion, and explicitly note the limitations of the small sample size in both the evaluation section and abstract. We will also discuss why statistical significance testing is not applicable with n=2. revision: partial
-
Referee: §3 (Framework description): The paper lacks sufficient details on the algorithms, parameters, or methods used to learn ESPs and EFPs from historical data, as well as the specific deviation thresholds or statistical tests for identifying anomalies in the online phase. This makes the training procedure and the reported F1-scores difficult to reproduce or assess for robustness.
Authors: We thank the referee for highlighting the need for greater reproducibility. In the revised manuscript, we will expand §3 with pseudocode for the ESP and EFP learning procedures, list all hyperparameters and thresholds (including deviation criteria and any statistical tests used for anomaly detection), and provide implementation-level details on the online phase. These additions will make the training and detection processes fully reproducible and allow better assessment of result robustness. revision: yes
-
Referee: Abstract and evaluation: The claim that EventADL 'outperforms existing methods' does not specify the baseline methods, how they were adapted to event data, or include quantitative comparison details (e.g., tables with metrics). This weakens the ability to evaluate the superiority asserted for both detection and localization.
Authors: We agree that the comparison claims require more explicit support. We will revise the abstract to name the baseline methods and update the evaluation section to include a detailed table with all metrics for both anomaly detection and localization. The revised text will also describe how each baseline was adapted to event data and provide the quantitative results that support the outperformance claims. revision: yes
- The limited number of available real-world incidents (only two) for root cause localization prevents expanding the sample size, performing cross-validation, or reporting variance/statistical significance for that component of the evaluation.
Circularity Check
No circularity in EventADL derivation chain
full rationale
The paper's core chain consists of offline learning of Event Semantic Patterns (ESPs) and Event Frequency Patterns (EFPs) from historical unlabeled event data, followed by online deviation detection and construction of an Intervention Graph from recent interactions plus detected anomalies. These are standard pattern-learning and graph-construction steps with no equations shown that equate outputs to inputs by definition. The 520-incident analysis is used only for design motivation, not as a fitted input renamed as prediction. Evaluation F1 scores and top-3 accuracy are measured on separate real systems and two incidents, not forced by the training procedure itself. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing elements.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
2025.Alibaba Cloud ActionTrail
Alibaba Cloud. 2025.Alibaba Cloud ActionTrail. https://www.alibabacloud.com/help/en/actiontrail/ ActionTrail monitors and records your Alibaba Cloud account activities; latest documentation updated February 12, 2025
work page 2025
-
[3]
Amazon Web Services. 2024. Understanding CloudTrail Events. https://docs.aws.amazon.com/awscloudtrail/latest/ userguide/cloudtrail-events.html. Accessed: 2025-06-06
work page 2024
-
[4]
Mohammad Ruhul Amin, Pranav Garg, and Baris Coskun. 2019. Cadence: Conditional anomaly detection for events using noise-contrastive estimation. InProceedings of the 12th ACM Workshop on Artificial Intelligence and Security
work page 2019
-
[5]
Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, and Juan Carlos Niebles. 2023. Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data. (2023)
work page 2023
-
[6]
Sergul Aydore, Baris Coskun, and Luca Melis. 2022. Detecting anomalous events from categorical data using autoen- coders. US Patent 11,537,902
work page 2022
-
[7]
Ting Chen, Lu-An Tang, Yizhou Sun, Zhengzhang Chen, and Kai Zhang. 2016. Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events. InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 1396–1403
work page 2016
-
[8]
Yuncong Chen. 2023. On injected anomalies. https://github.com/CloudWise-OpenSource/GAIA-DataSet/issues/11
work page 2023
-
[9]
Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xiao Ling, Yongqiang Yang, and Michael R Lyu. 2022. Adaptive Performance Anomaly Detection For Online Service Systems Via Pattern Sketching. InProceedings of the 44th International Conference on Software Engineering. 61–72
work page 2022
- [10]
-
[11]
CloudWise-OpenSource. 2025. GAIA: Generic AIOps Atlas. https://github.com/CloudWise-OpenSource/GAIA-DataSet. CloudWise GAIA Dataset for AIOps
work page 2025
-
[12]
Baris Coskun, Wei Ding, and Luca Melis. 2022. Detecting anomalous events using autoencoders. US Patent 11,374,952
work page 2022
-
[13]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of ACM SIGSAC conference on computer and communications security
work page 2017
-
[14]
Aoyang Fang, Songhan Zhang, Yifan Yang, Haotong Wu, Junjielong Xu, Xuyang Wang, Rui Wang, Manyi Wang, Qisheng Lu, and Pinjia He. 2025. Rethinking the Evaluation of Microservice RCA with a Fault Propagation-Aware Benchmark. arXiv:2510.04711 [cs.SE] https://arxiv.org/abs/2510.04711
- [15]
-
[16]
Google Cloud. 2024. Cloud Audit Logs Overview. https://cloud.google.com/logging/docs/audit. Accessed: 2025-06-06
work page 2024
-
[17]
Wenwei Gu, Jiazhen Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Jinxi Kuang, Cong Feng, Yongqiang Yang, and Michael R Lyu. 2025. ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems. InProceedings of the IEEE/ACM 47th International Conference on Software Engineering. 911–923
work page 2025
-
[18]
Wenwei Gu, Xinying Sun, Jinyang Liu, Yintong Huo, Zhuangbin Chen, Jianping Zhang, Jiazhen Gu, Yongqiang Yang, and Michael R Lyu. 2024. Kpiroot: Efficient monitoring metric-based root cause localization in large-scale cloud systems. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 403–414
work page 2024
-
[19]
Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. 2024. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 135–143
work page 2024
-
[20]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In2017 IEEE international conference on web services (ICWS). IEEE, 33–40
work page 2017
-
[21]
Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. 2022. Root cause analysis of failures in microservices through causal discovery.Advances in Neural Information Processing Systems35 (2022), 31158–31170
work page 2022
-
[22]
Max Landauer, Florian Skopik, and Markus Wurzenberger. 2024. A critical review of common log data sets used for evaluation of sequence-based anomaly detection techniques.Proceedings of the ACM on Software Engineering1, FSE (2024), 1354–1375
work page 2024
-
[23]
Van-Hoang Le and Hongyu Zhang. 2021. Log-based anomaly detection without log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 492–504
work page 2021
-
[24]
Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, and Michael R Lyu. 2023. Eadro: An end-to-end troubleshooting framework for microservices on multi-source data. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1750–1762. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE179. Publication date: July 2026. FSE179:22 Luan...
work page 2023
-
[25]
Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Yongqiang Yang, and Michael R Lyu. 2023. Heterogeneous anomaly detection for software systems via semi-supervised cross-modal attention. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1724–1736
work page 2023
-
[26]
Daesoo Lee, Sara Malacarne, and Erlend Aune. 2024. Explainable time series anomaly detection using masked latent generative modeling.Pattern Recognition156 (2024), 110826
work page 2024
-
[27]
Liqun Li, Xu Zhang, Shilin He, Yu Kang, Hongyu Zhang, Minghua Ma, Yingnong Dang, Zhangwei Xu, Saravan Rajmohan, Qingwei Lin, et al . 2023. Conan: Diagnosing batch failures for cloud systems. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 138–149
work page 2023
-
[28]
Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. 2022. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’22, Vol. 1). Association for Computing Machinery, New York, N...
-
[29]
Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2020. Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 92–103
work page 2020
-
[30]
Boyang Liu, Ding Wang, Kaixiang Lin, Pang-Ning Tan, and Jiayu Zhou. 2021. RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection. InIJCAI. ijcai.org, 1505–1511
work page 2021
-
[31]
Ping Liu, Haowen Xu, Qianyu Ouyang, Rui Jiao, Zhekang Chen, Shenglin Zhang, Jiahai Yang, Linlin Mo, Jice Zeng, Wenman Xue, et al. 2020. Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 48–58
work page 2020
-
[32]
Yue Lu, Renjie Wu, Abdullah Mueen, Maria A Zuluaga, and Eamonn Keogh. 2022. Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1173–1182
work page 2022
-
[33]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. InIJCAI, Vol. 19. 4739–4745
work page 2019
-
[34]
Microsoft Corporation. 2024. Ingest Events from Azure Event Hubs into Azure Monitor Logs. https://learn.microsoft. com/en-us/azure/azure-monitor/logs/ingest-logs-event-hub. Accessed: 2025-06-06
work page 2024
-
[35]
AIOps Nankai. 2021. AIOps 2021 Challenge Dataset. https://www.aiops.cn/gitlab/aiops-nankai/data/trace/aiops2021/
work page 2021
-
[36]
Sasho Nedelkoski, Jorge Cardoso, and Odej Kao. 2019. Anomaly detection from system tracing data using multimodal deep learning. In2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, 179–186
work page 2019
-
[37]
Hiep Nguyen, Yongmin Tan, and Xiaohui Gu. 2011. PAL: Propagation-aware Anomaly Localization for Cloud Hosted Distributed Applications. InManaging Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques. 1–8
work page 2011
-
[38]
Open Cybersecurity Schema Framework. 2022. Open Cybersecurity Schema Framework (OCSF). https://github.com/ ocsf Accessed: 2025-08-04
work page 2022
-
[39]
OpenTelemetry. 2025. Semantic Conventions for Events. https://opentelemetry.io/docs/specs/semconv/general/events/. Accessed: 2025-06-06
work page 2025
-
[40]
Luan Pham. 2026.Artifacts of "EventADL: Open-Box Anomaly Detection and Localization Framework for Events in Cloud-Based Service Systems". doi:10.5281/zenodo.19433493
- [41]
-
[42]
Luan Pham, Huong Ha, and Hongyu Zhang. 2024. BARO: Robust root cause analysis for microservices via multivariate bayesian online change point detection.Proceedings of the ACM on Software Engineering1, FSE (2024), 2214–2237
work page 2024
-
[43]
Luan Pham, Huong Ha, and Hongyu Zhang. 2024. Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 706–715
work page 2024
-
[44]
Luan Pham, Huong Ha, Xiuzhen Zhang, and Hongyu Zhang. 2026. TORAI: Multi-Source Root Cause Analysis for Blind Spots in Microservice Service Call Graph.Proceedings of the ACM on Software Engineering3, FSE (2026)
work page 2026
-
[45]
Luan Pham, Hongyu Zhang, Huong Ha, Flora Salim, and Xiuzhen Zhang. 2025. RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data. InCompanion Proceedings of the ACM on Web Conference 2025
work page 2025
-
[46]
Chen Qiu, Timo Pfrommer, Marius Kloft, Stephan Mandt, and Maja Rudolph. 2021. Neural Transformation Learning for Deep Anomaly Detection Beyond Images. InProceedings of Machine Learning Research, Vol. 139. 8703–8714
work page 2021
-
[47]
Rui Ren, Jingbang Yang, Linxiao Yang, Xinyue Gu, and Liang Sun. 2024. SLIM: a Scalable Light-weight Root Cause Analysis for Imbalanced Data in Microservice. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 328–330
work page 2024
-
[48]
Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, and Saravan Rajmohan
-
[49]
Exploring LLM-based Agents for Root Cause Analysis.arXiv preprint arXiv:2403.04123(2024). Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE179. Publication date: July 2026. EventADL: Open-Box Anomaly Detection and Localization Framework for Events in Cloud-Based Service Systems FSE179:23
-
[50]
Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. InInternational conference on machine learning. PMLR
work page 2018
-
[51]
Huasong Shan, Yunpeng Zhang, Yuan Chen, Xiao Xiao, Haifeng Liu, Xiaofeng He, Min Li, and Wei Ding. 2019. 𝜖-Diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms.Proceedings of the World Wide Web Conference, WWW 2019(2019), 3215–3222
work page 2019
-
[52]
Tom Shenkar and Lior Wolf. 2022. Anomaly detection for tabular data with internal contrastive learning. InInternational Conference on Learning Representations
work page 2022
-
[53]
Jacopo Soldani and Antonio Brogi. 2022. Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey.ACM Computing Surveys (CSUR)55, 3 (2022), 1–39
work page 2022
-
[54]
Yongqian Sun, Zihan Lin, Binpeng Shi, Shenglin Zhang, Shiyu Ma, Pengxiang Jin, Zhenyu Zhong, Lemeng Pan, Yicheng Guo, and Dan Pei. 2025. Interpretable failure localization for microservice systems based on graph autoencoder.ACM Transactions on Software Engineering and Methodology34, 2 (2025), 1–28
work page 2025
-
[55]
Yongqian Sun, Binpeng Shi, Mingyu Mao, Minghua Ma, Sibo Xia, Shenglin Zhang, and Dan Pei. 2024. Art: A unified unsupervised framework for incident management in microservice systems. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 1183–1194
work page 2024
-
[56]
Wil Van der Aalst, Ton Weijters, and Laura Maruster. 2004. Workflow Mining: Discovering Process Models from Event Logs.IEEE Transactions on Knowledge and Data Engineering16, 9 (2004), 1128–1142
work page 2004
-
[57]
Jeremy Wadhams. n.d.. JSONLogic: A lightweight, safe way to share logic between systems. https://jsonlogic.com/
-
[58]
Dongjie Wang, Zhengzhang Chen, Yanjie Fu, Yanchi Liu, and Haifeng Chen. 2023. Incremental causal graph learning for online root cause analysis. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 2269–2278
work page 2023
-
[59]
Hu Wang, Guansong Pang, Chunhua Shen, and Congbo Ma. 2020. Unsupervised Representation Learning by Predicting Random Distances. InIJCAI. ijcai.org, 2950–2956
work page 2020
-
[60]
Hanzhang Wang, Zhengkai Wu, Huai Jiang, Yichao Huang, Jiamu Wang, Selcuk Kopru, and Tao Xie. 2021. Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings.Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering(2021), 419–429
work page 2021
-
[61]
Yidan Wang, Zhouruixing Zhu, Qiuai Fu, Yuchi Ma, and Pinjia He. 2024. MRCA: Metric-level root cause analysis for microservices via multi-modal data. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 1057–1068
work page 2024
-
[62]
2024.Experimentation in Software Engineering
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, Anders Wesslén, et al. 2024.Experimentation in Software Engineering. Springer
work page 2024
-
[63]
Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, and Bing Li. 2026. TVDiag: A Task- oriented and View-invariant Failure Diagnosis Framework for Microservice-based Systems with Multimodal Data. ACM Transactions on Software Engineering and Methodology35, 2 (2026), 1–39
work page 2026
-
[64]
Ruyue Xin, Peng Chen, and Zhiming Zhao. 2023. CausalRCA: Causal inference based precise fine-grained root cause localization for microservice applications.Journal of Systems and Software203 (2023), 111724
work page 2023
-
[65]
Hongzuo Xu, Guansong Pang, Yijie Wang, and Yongjun Wang. 2023. Deep Isolation Forest for Anomaly Detection. IEEE Trans. Knowl. Data Eng.35, 12 (2023), 12591–12604
work page 2023
-
[66]
Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1448–1460
work page 2021
-
[67]
Guangba Yu, Pengfei Chen, Yufeng Li, Hongyang Chen, Xiaoyun Li, and Zibin Zheng. 2023. Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 553–565
work page 2023
-
[68]
Jun Zengy, Xiang Wang, Jiahao Liu, Yinfang Chen, Zhenkai Liang, Tat-Seng Chua, and Zheng Leong Chua. 2022. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records. In2022 IEEE symposium on security and privacy (SP). IEEE, 489–506
work page 2022
-
[69]
Chenxi Zhang, Zhen Dong, Xin Peng, Bicheng Zhang, and Miao Chen. 2024. Trace-based multi-dimensional root cause localization of performance issues in microservice systems. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12
work page 2024
-
[70]
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. InProceedings of the ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 807–817
work page 2019
-
[71]
Lecheng Zheng, Zhengzhang Chen, Jingrui He, and Haifeng Chen. 2024. MULAN: multi-modal causal structure learning and root cause analysis for microservice systems. InProceedings of the ACM Web Conference 2024. 4107–4116. Received 2025-09-11; accepted 2026-03-24 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE179. Publication date: July 2026
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.