PyRCA: A Library for Metric-based Root Cause Analysis
read the original abstract
We introduce PyRCA, an open-source Python machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps). It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents. It offers a unified interface for multiple commonly used RCA models, encompassing both graph construction and scoring tasks. This library aims to provide IT operations staff, data scientists, and researchers a one-step solution to rapid model development, model evaluation and deployment to online applications. In particular, our library includes various causal discovery methods to support causal graph construction, and multiple types of root cause scoring methods inspired by Bayesian analysis, graph analysis and causal analysis, etc. Our GUI dashboard offers practitioners an intuitive point-and-click interface, empowering them to easily inject expert knowledge through human interaction. With the ability to visualize causal graphs and the root cause of incidents, practitioners can quickly gain insights and improve their workflow efficiency. This technical report introduces PyRCA's architecture and major functionalities, while also presenting benchmark performance numbers in comparison to various baseline models. Additionally, we demonstrate PyRCA's capabilities through several example use cases.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing
Agentic Witnessing enables privacy-preserving auditing of semantic properties in private data by running an LLM auditor in a TEE that answers binary queries and produces cryptographic transcripts of its reasoning.
-
TORAI: Multi-source Root Cause Analysis for Blind Spots in Microservice Service Call Graph
TORAI finds fine-grained root causes in microservice failures with blind spots by measuring anomaly severity from multi-source telemetry, clustering services by symptoms, ranking via causal analysis within clusters, a...
-
Anomaly Detection and Root Cause Analysis for Microservice Systems
Thesis proposes BARO for metrics, EventADL for events, TORAI for multimodal RCA without call graphs, and RCAEval benchmark with systematic evaluation of causal methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.