StableRCA: Robust Graph-Agnostic Mechanism-Level Root Cause Analysis
Pith reviewed 2026-06-28 02:26 UTC · model grok-4.3
The pith
Intervention targets can be identified with probability converging exponentially in sample size by estimating local Markov boundaries and detecting conditional distribution shifts within them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
StableRCA identifies intervention targets by estimating local Markov boundaries and detecting conditional distribution shifts within them, leveraging the Independent Causal Mechanism principle. Under faithful Markov boundary recovery and non-degenerate mechanism shifts, the probability of correctly identifying the targets converges exponentially in the sample size. Experiments on synthetic and five real-world datasets show the approach remains effective when graphs are misspecified, when multiple targets are present, and across different application scales.
What carries the argument
Local Markov boundary estimation paired with conditional distribution shift detection inside those boundaries, which isolates intervened mechanisms without global graph construction.
If this is right
- The method tolerates inaccurate or missing global causal graphs.
- It continues to work when several variables are intervened on at once.
- Computation remains feasible as the number of variables grows large.
- Performance holds across manufacturing, cloud computing, and healthcare data.
Where Pith is reading between the lines
- The local nature of the procedure could be combined with streaming data to monitor live systems without periodic full-graph re-estimation.
- If boundary recovery is only approximately faithful, the method might still produce useful rankings of candidate causes rather than exact identification.
- The same local-shift test could be applied inside existing anomaly detection pipelines to move from symptom detection to mechanism-level diagnosis.
Load-bearing premise
The exponential convergence result requires that Markov boundaries are recovered faithfully and that the mechanism shifts are non-degenerate.
What would settle it
A controlled experiment in which Markov boundary recovery is deliberately made unfaithful (for example by adding hidden variables that violate the faithfulness assumption) and the observed identification probability fails to increase exponentially with sample size.
Figures
read the original abstract
Root-Cause Analysis (RCA) seeks to identify the variables responsible for abnormal system behavior in complex domains such as manufacturing, cloud computing, and healthcare. Existing approaches face a critical bottleneck: graph-based causal methods can identify intervention targets but typically require a known or accurately estimated causal graph, while graph-free statistical methods either localize marginal anomalies rather than structural causes, or rely on restrictive assumptions about graph structure or functional form. We propose StableRCA, a local mechanism-level RCA framework that avoids global graph discovery by estimating local Markov boundaries and detecting conditional distribution shifts within them. Leveraging the Independent Causal Mechanism principle, we show that intervention targets can be identified with probability converging exponentially in sample size under faithful Markov boundary recovery and non-degenerate mechanism shifts. Experiments on synthetic benchmarks and five real-world datasets demonstrate that StableRCA is robust to graph misspecification, effective under multiple intervention targets, scalable to large systems, and reliable across diverse application domains. Code is available at: https://anonymous.4open.science/r/StableRCA-E362
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes StableRCA, a local mechanism-level root cause analysis framework that estimates Markov boundaries to detect conditional distribution shifts and identify intervention targets without requiring a global causal graph. It claims that, under faithful Markov boundary recovery and non-degenerate mechanism shifts, the probability of correctly identifying intervention targets converges exponentially in sample size, and reports empirical robustness on synthetic benchmarks and five real-world datasets across manufacturing, cloud, and healthcare domains.
Significance. If the stated convergence result holds under the explicitly listed conditions, the work offers a practical graph-agnostic alternative to existing RCA methods that either require accurate global graphs or rely on marginal anomaly detection. The explicit statement of prerequisites (faithful boundary recovery and non-degenerate shifts) and the release of code are positive features that support reproducibility and allow direct testing of the assumptions.
minor comments (3)
- The abstract states the exponential convergence result but does not reference the specific theorem or section containing the derivation; adding an explicit pointer (e.g., Theorem 3.2) would improve traceability.
- The link to code is given as an anonymous repository; the final version should replace it with a permanent, citable URL or GitHub repository.
- Table or figure captions for the real-world datasets should include the number of variables and sample sizes to allow readers to assess scalability claims directly.
Simulated Author's Rebuttal
We thank the referee for the positive review, accurate summary of our contributions, and recommendation for minor revision. No major comments were raised in the report.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core claim is an exponential convergence result for identifying intervention targets, explicitly conditioned on two external prerequisites (faithful Markov boundary recovery and non-degenerate mechanism shifts) plus the Independent Causal Mechanism principle. These are presented as assumptions rather than derived internally. No equations or steps in the abstract reduce the result to a fitted parameter, self-definition, or self-citation chain; the method treats boundary estimation and shift detection as inputs. This matches the default expectation of a self-contained theoretical statement with no load-bearing circularity.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Independent Causal Mechanism principle
- ad hoc to paper Faithful Markov boundary recovery
- ad hoc to paper Non-degenerate mechanism shifts
Reference graph
Works this paper leans on
-
[1]
Langley , title =
P. Langley , title =. 2000 , pages =
2000
-
[2]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
1980
-
[3]
M. J. Kearns , title =
-
[4]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
1983
-
[5]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
2000
-
[6]
Suppressed for Anonymity , author=
-
[7]
Newell and P
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
1981
-
[8]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
1959
-
[9]
Causality , publisher=
Pearl, Judea , year=. Causality , publisher=
-
[10]
Proceedings of the IEEE , volume =
Toward Causal Representation Learning , author =. Proceedings of the IEEE , volume =
-
[11]
Proceedings of International Conference on Machine Learning , pages =
A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization , author =. Proceedings of International Conference on Machine Learning , pages =. 2022 , volume =
2022
-
[12]
2019 , booktitle =
Shan, Huasong and Chen, Yuan and Liu, Haifeng and Zhang, Yunpeng and Xiao, Xiao and He, Xiaofeng and Li, Min and Ding, Wei , title =. 2019 , booktitle =
2019
-
[13]
2022 , booktitle =
Ikram, Azam and Chakraborty, Sarthak and Mitra, Subrata and Saini, Shiv Kumar and Bagchi, Saurabh and Kocaoglu, Murat , title =. 2022 , booktitle =
2022
-
[14]
CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems , year=
Chen, Pengfei and Qi, Yong and Zheng, Pengfei and Hou, Di , booktitle=. CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems , year=
-
[15]
2022 , booktitle =
Li, Mingjie and Li, Zeyan and Yin, Kanglin and Nie, Xiaohui and Zhang, Wenchi and Sui, Kaixin and Pei, Dan , title =. 2022 , booktitle =
2022
-
[16]
Journal of the Royal Statistical Society Series B: Statistical Methodology , pages =
Li, Jinzhou and Chu, Benjamin B and Scheller, Ines F and Gagneur, Julien and Maathuis, Marloes H , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages =. 2025 , month =
2025
-
[17]
The Annual Conference on Neural Information Processing Systems , year=
Root Cause Analysis of Outliers with Missing Structural Knowledge , author=. The Annual Conference on Neural Information Processing Systems , year=
-
[18]
Proceedings of the AAAI Conference on Artificial Intelligence , year =
Stable Learning via Sample Reweighting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , year =
-
[19]
1988 , publisher=
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference , author=. 1988 , publisher=
1988
-
[20]
Lytkin and Jan Lemeire and Constantin F
Alexander Statnikov and Nikita I. Lytkin and Jan Lemeire and Constantin F. Aliferis , title =. Journal of Machine Learning Research , year =
-
[21]
1999 , publisher=
Practical Nonparametric Statistics , author=. 1999 , publisher=
1999
-
[22]
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =
Karl Pearson , title =. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =
-
[23]
Density Ratio Estimation in Machine Learning , publisher=
Sugiyama, Masashi and Suzuki, Taiji and Kanamori, Takafumi , year=. Density Ratio Estimation in Machine Learning , publisher=
-
[24]
Dvoretzky and J
A. Dvoretzky and J. Kiefer and J. Wolfowitz , title =. The Annals of Mathematical Statistics , number =
-
[25]
Massart , journal =
P. Massart , journal =. The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , volume =
-
[26]
2003 , booktitle=
Inequalities for the L1 Deviation of the Empirical Distribution , author=. 2003 , booktitle=
2003
-
[27]
2024 , eprint=
The PetShop Dataset -- Finding Causes of Performance Issues across Microservices , author=. 2024 , eprint=
2024
-
[28]
and Peters, Jonas and B
Gamella, Juan L. and Peters, Jonas and B. Causal chambers as a real-world physical testbed for. Nature Machine Intelligence , year=
-
[29]
Automatic root cause analysis in manufacturing: an overview & conceptualization , year =
e Oliveira, Eduardo and Migu\'. Automatic root cause analysis in manufacturing: an overview & conceptualization , year =. Journal of Intelligent Manufacturing , pages =
-
[30]
Survey on Models and Techniques for Root-Cause Analysis
Marc Solé and Victor Muntés-Mulero and Annie Ibrahim Rana and Giovani Estrada , title =. arXiv:1701.08546 , year =. 1701.08546 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
and Dimitriou, Nikolaos and Tzovaras, Dimitrios and Margetis, George , title =
Papageorgiou, Konstantinos and Theodosiou, Theodoros and Rapti, Angeliki and Papageorgiou, Elpiniki I. and Dimitriou, Nikolaos and Tzovaras, Dimitrios and Margetis, George , title =. Frontiers in Manufacturing Technology , volume =
-
[32]
ACM Computing Surveys , numpages =
Soldani, Jacopo and Brogi, Antonio , title =. ACM Computing Surveys , numpages =. 2022 , volume =
2022
-
[33]
and Lipshutz, Alison K
Wu, Albert W. and Lipshutz, Alison K. M. and Pronovost, Peter J. , title =. Journal of the American Medical Association , volume =
-
[34]
and Hettinger, Zane and Shah, Mihir and Wears, Robert L
Kellogg, Kelly M. and Hettinger, Zane and Shah, Mihir and Wears, Robert L. and Sellers, Charles R. and Squires, Michelle and Fairbanks, Robert J. , title =. BMJ Quality & Safety , volume =. 2017 , month =
2017
-
[35]
Social Science Computer Review , volume =
Peter Spirtes and Clark Glymour , title =. Social Science Computer Review , volume =
-
[36]
Spirtes, Peter and Glymour, Clark and Scheines, Richard , title =
-
[37]
2021 , booktitle =
Varici, Burak and Shanmugam, Karthikeyan and Sattigeri, Prasanna and Tajer, Ali , title =. 2021 , booktitle =
2021
-
[38]
BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions , year =
Rothenh\". BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions , year =. Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1 , pages =
-
[39]
Proceedings of Conference on Uncertainty in Artificial Intelligence , year=
Intervention Target Estimation in the Presence of Latent Variables , author=. Proceedings of Conference on Uncertainty in Artificial Intelligence , year=
-
[40]
2023 , booktitle =
Wang, Dongjie and Chen, Zhengzhang and Fu, Yanjie and Liu, Yanchi and Chen, Haifeng , title =. 2023 , booktitle =
2023
-
[41]
2024 , booktitle =
Zheng, Lecheng and Chen, Zhengzhang and He, Jingrui and Chen, Haifeng , title =. 2024 , booktitle =
2024
-
[42]
2024 , booktitle =
Lin, Cheng-Ming and Chang, Ching and Wang, Wei-Yao and Wang, Kuang-Da and Peng, Wen-Chih , title =. 2024 , booktitle =
2024
-
[43]
Journal of Systems and Software , numpages =
Xin, Ruyue and Chen, Peng and Zhao, Zhiming , title =. Journal of Systems and Software , numpages =. 2023 , volume =
2023
-
[44]
Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments
Lin, Jinjin and Chen, Pengfei and Zheng, Zibin. Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments. Service-Oriented Computing. 2018
2018
-
[45]
2020 , booktitle =
Ma, Meng and Xu, Jingmin and Wang, Yuan and Chen, Pengfei and Zhang, Zonghua and Wang, Ping , title =. 2020 , booktitle =
2020
-
[46]
2024 , booktitle =
Pham, Luan and Ha, Huong and Zhang, Hongyu , title =. 2024 , booktitle =
2024
-
[47]
2024 , booktitle =
Nguyen, Phuoc and Tran, Truyen and Gupta, Sunil and Nguyen, Thin and Venkatesh, Svetha , title =. 2024 , booktitle =
2024
-
[48]
The International Conference on Learning Representations , year=
Robust Root Cause Diagnosis using In-Distribution Interventions , author=. The International Conference on Learning Representations , year=
-
[49]
2023 , booktitle =
Han, Xiao and Zhang, Lu and Wu, Yongkai and Yuan, Shuhan , title =. 2023 , booktitle =
2023
-
[50]
and Margaritis, D
Yaramakala, S. and Margaritis, D. , booktitle=. Speculative Markov blanket discovery for optimal feature selection , year=
-
[51]
Accurate Markov Boundary Discovery for Causal Feature Selection , year=
Wu, Xingyu and Jiang, Bingbing and Yu, Kui and Miao, chunyan and Chen, Huanhuan , journal=. Accurate Markov Boundary Discovery for Causal Feature Selection , year=
-
[52]
Bayesian Network Induction via Local Neighborhoods , volume =
Margaritis, Dimitris and Thrun, Sebastian , booktitle =. Bayesian Network Induction via Local Neighborhoods , volume =
-
[53]
and Statnikov, Alexander , title =
Tsamardinos, Ioannis and Aliferis, Constantin F. and Statnikov, Alexander , title =. Proceedings of International Florida Artificial Intelligence Research Society Conference , year =
-
[54]
Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control , booktitle =
Pe. Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control , booktitle =. 2008 , pages =
2008
-
[55]
Proceedings of Conference on Uncertainty in Artificial Intelligence , pages =
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach , author =. Proceedings of Conference on Uncertainty in Artificial Intelligence , pages =. 2020 , volume =
2020
-
[56]
1996 , booktitle =
Koller, Daphne and Sahami, Mehran , title =. 1996 , booktitle =
1996
-
[57]
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , journal =
Gavin Brown and Adam Pocock and Ming-Jie Zhao and Mikel Luj. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , journal =. 2012 , volume =
2012
-
[58]
Stable Prediction with Model Misspecification and Agnostic Distribution Shift , volume=
Kuang, Kun and Xiong, Ruoxuan and Cui, Peng and Athey, Susan and Li, Bo , year=. Stable Prediction with Model Misspecification and Agnostic Distribution Shift , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , pages=
-
[59]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =
Zhang, Xingxuan and Cui, Peng and Xu, Renzhe and Zhou, Linjun and He, Yue and Shen, Zheyan , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =. 2021 , pages =
2021
-
[60]
, title =
Quinonero-Candela, Joaquin and Sugiyama, Masashi and Schwaighofer, Anton and Lawrence, Neil D. , title =. 2009 , pages =
2009
-
[61]
2012 , month =
Sugiyama, Masashi and Kawanabe, Motoaki , title =. 2012 , month =
2012
-
[62]
and Bagchi, Saurabh and Inouye, David I
Kulinski, Sean M. and Bagchi, Saurabh and Inouye, David I. , title =. 2020 , booktitle =
2020
-
[63]
Kernel measures of conditional dependence , year =
Fukumizu, Kenji and Gretton, Arthur and Sun, Xiaohai and Sch\". Kernel measures of conditional dependence , year =. Proceedings of International Conference on Neural Information Processing Systems , pages =
-
[64]
Shimodaira, Hidetoshi , journal =
-
[65]
2018 , booktitle =
Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , title =. 2018 , booktitle =
2018
-
[66]
Covariate Shift Adaptation by Importance Weighted Cross Validation , year =
Sugiyama, Masashi and Krauledat, Matthias and M\". Covariate Shift Adaptation by Importance Weighted Cross Validation , year =. Journal of Machine Learning Research , month = dec, pages =
-
[67]
Ahmed Dawoud and Shravan Talupula , year=. ProRCA: A Causal Python Package for Actionable Root Cause Analysis in Real-world Business Scenarios , journal =. 2503.01475 , archivePrefix=
-
[68]
Proceedings of Conference on Uncertainty in Artificial Intelligence , year=
Extremely Greedy Equivalence Search , author=. Proceedings of Conference on Uncertainty in Artificial Intelligence , year=
-
[69]
2022 , howpublished =
Sock-shop: A Microservice Demo Application , author =. 2022 , howpublished =
2022
-
[70]
Companion Proceedings of the ACM Web Conference 2025 , year =
Pham, Luan and Zhang, Hongyu and Ha, Huong and Salim, Flora and Zhang, Xiuzhen , title =. Companion Proceedings of the ACM Web Conference 2025 , year =
2025
-
[71]
and Mendelson, Shahar , title =
Bartlett, Peter L. and Mendelson, Shahar , title =. Journal of Machine Learning Research , month = mar, pages =. 2003 , issue_date =
2003
-
[72]
CausalMan: A physics-based simulator for large-scale causality , author=. arXiv:2502.12707 , year=. 2502.12707 , archivePrefix=
-
[73]
Publicationes Mathematicae Debrecen , pages = 290, title =
Erd\". Publicationes Mathematicae Debrecen , pages = 290, title =
-
[74]
International conference on machine learning , pages=
Causal structure-based root cause analysis of outliers , author=. International conference on machine learning , pages=. 2022 , organization=
2022
-
[75]
Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=
Root cause discovery via permutations and Cholesky decomposition , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=
2025
-
[76]
The World Wide Web Conference , pages=
?-diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms , author=. The World Wide Web Conference , pages=
-
[77]
IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice , pages=
Microhecl: High-efficient root cause localization in large-scale microservice systems , author=. IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice , pages=
-
[78]
Cumulated gain-based evaluation of IR techniques , year =
J\". Cumulated gain-based evaluation of IR techniques , year =. ACM Transactions on Information Systems , month = oct, pages =
-
[79]
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) , pages =
Ping Wang and Jingmin Xu and Meng Ma and Weilan Lin and Disheng Pan and Yuan Wang and Pengfei Chen , title =. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) , pages =. 2018 , publisher =
2018
-
[80]
Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence , volume =
Root Cause Analysis of Failures from Partial Causal Structures , author =. Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.