pith. sign in

arxiv: 2605.08786 · v2 · pith:YDR4FNFUnew · submitted 2026-05-09 · 💻 cs.LG

PRIM: Meta-Learned Bayesian Root Cause Analysis

Pith reviewed 2026-05-19 17:50 UTC · model grok-4.3

classification 💻 cs.LG
keywords root cause analysismeta-learningcausal inferencebayesian inferenceneural processesanomaly detectionzero-shot inference
0
0 comments X

The pith

PRIM frames root cause analysis as Bayesian inference over a synthetic prior of causal models to enable fast zero-shot detection of distributional changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PRIM, a method that uses meta-learning to perform Bayesian root cause analysis. It trains on a synthetic prior of causal models so that at test time it can marginalize structural uncertainty and spot changes in how data is generated. This lets it find distributional differences and causal relations without running statistical tests or fitting new models. The result is fast inference that works on systems with many variables and performs well compared to methods that already know the causal structure.

Core claim

PRIM (Prior-fitted Root cause Identification with Meta-learning) frames RCA as a Bayesian inference task over a synthetic prior of causal models. By marginalising out structural uncertainty, PRIM implicitly identifies changes in the data-generating mechanism between baseline and anomalous periods. In doing so, PRIM infers distributional differences without explicit statistical testing, and implicitly learns causal structure without model fitting at test time. Following the simulation-based meta-learning paradigm of prior-fitted networks, PRIM uses a Model-Averaged Causal Estimation (MACE) transformer neural process that jointly attends over observational and anomalous samples and the causal

What carries the argument

Model-Averaged Causal Estimation (MACE) transformer neural process that jointly attends over observational and anomalous samples and the causal structure of nodes to marginalize structural uncertainty.

Load-bearing premise

The synthetic prior of causal models used for meta-training is sufficiently representative of the structural and distributional properties of the target real-world systems.

What would settle it

Demonstrating that PRIM's root cause identification accuracy drops below graph-aware baselines on datasets whose causal structures or distributions lie outside the range seen in the meta-training synthetic prior.

Figures

Figures reproduced from arXiv: 2605.08786 by Amadou Ba, Anish Dhir, Bradley Eck, Christopher Lohse, Jonas Wahl, Marco Ruffini.

Figure 1
Figure 1. Figure 1: PRIM architecture. L MACE-TNP blocks refine obs/int embeddings via alternating sample￾and node-level attention. The difference ∆ = H¯ int − H¯ obs is decoded to per-node logits Tˆ ∈ R K. Our model, PRIM (Prior-fitted Root cause Identification with Meta-learning), is built around the MACE-TNP architecture introduced by Dhir et al. [8] for estimating interventional distributions. While the original MACE-TNP … view at source ↗
Figure 2
Figure 2. Figure 2: Three-node confounder vs. mediator scenario. X ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-root-cause evaluation on a 6-node DAG (left). [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Recall@1 vs. number of nodes at nobs = 100, nint = 10. As highlighted in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the data generation process. A causal graph [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

Root cause analysis (RCA) in complex systems is challenging due to error propagation across multiple variables, the need for structural causal knowledge, and the computational cost of inference at test time. We introduce PRIM (Prior-fitted Root cause Identification with Meta-learning), a causal meta-learning approach that frames RCA as a Bayesian inference task over a synthetic prior of causal models. By marginalising out structural uncertainty, PRIM implicitly identifies changes in the data-generating mechanism between baseline and anomalous periods. In doing so, PRIM infers distributional differences without explicit statistical testing, and implicitly learns causal structure without model fitting at test time. Following the simulation-based meta-learning paradigm of prior-fitted networks, PRIM uses a Model-Averaged Causal Estimation (MACE) transformer neural process that jointly attends over observational and anomalous samples and the causal structure of nodes, enabling zero-shot inference in 17,ms for systems with up to 100 variables. Across synthetic benchmarks and two realistic benchmark datasets, PetShop and CausRCA, PRIM is competitive with methods that are aware of the system's causal graphical structure a priori while outperforming graph-unaware methods on several tasks. Lightweight fine-tuning to specific domains and data dynamics improves performance further.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PRIM, a causal meta-learning method for root cause analysis that frames RCA as Bayesian inference over a synthetic prior of causal models. It employs a Model-Averaged Causal Estimation (MACE) transformer neural process to jointly attend over observational/anomalous samples and implicit structure, enabling zero-shot inference of distributional shifts and causal structure in 17 ms for systems up to 100 variables without test-time fitting or explicit statistical tests. The approach is evaluated on synthetic benchmarks and two realistic datasets (PetShop and CausRCA), where it is reported to be competitive with graph-aware methods and superior to graph-unaware baselines, with optional lightweight fine-tuning for further gains.

Significance. If the synthetic prior is shown to be representative of target domains and the performance claims are statistically supported, this could represent a meaningful advance in practical, scalable RCA by removing the need for a priori causal graphs or per-instance optimization. The prior-fitted meta-learning paradigm applied to causal estimation is a strength, as is the emphasis on fast zero-shot inference. These elements address real computational bottlenecks in complex systems monitoring.

major comments (2)
  1. [§3] §3 (Method, synthetic prior construction): The central claim of zero-shot generalization via marginalization over structural uncertainty requires that the synthetic prior covers the graph densities, variable types, noise regimes, and anomaly propagation patterns of the evaluation domains. No quantitative characterization (e.g., edge-probability ranges, functional-form distributions, or anomaly-injection statistics) is provided, nor is independence from the PetShop and CausRCA benchmarks demonstrated. This directly affects whether the reported competitiveness reflects true causal identification or in-support pattern matching.
  2. [§4] §4 (Experiments, performance tables): The abstract and results claim competitive performance on synthetic and realistic benchmarks without visible error bars, ablation studies on prior hyperparameters, or explicit data-exclusion rules. This makes it impossible to verify whether outperformance over graph-unaware methods and parity with graph-aware methods is statistically reliable, undermining support for the zero-shot inference claim.
minor comments (2)
  1. [Abstract] Abstract: '17,ms' should be corrected to '17 ms' for clarity.
  2. [§3] Notation: The joint attention mechanism in the MACE transformer would benefit from an explicit equation showing how observational, anomalous, and structural inputs are combined before marginalization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us strengthen the presentation of our work. We provide point-by-point responses to the major comments below and indicate the revisions incorporated into the updated manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Method, synthetic prior construction): The central claim of zero-shot generalization via marginalization over structural uncertainty requires that the synthetic prior covers the graph densities, variable types, noise regimes, and anomaly propagation patterns of the evaluation domains. No quantitative characterization (e.g., edge-probability ranges, functional-form distributions, or anomaly-injection statistics) is provided, nor is independence from the PetShop and CausRCA benchmarks demonstrated. This directly affects whether the reported competitiveness reflects true causal identification or in-support pattern matching.

    Authors: We agree that explicit quantitative details on the synthetic prior are necessary to substantiate the zero-shot generalization claim. In the revised manuscript we have expanded §3 with a dedicated subsection that reports the prior construction parameters: edge probabilities are sampled uniformly from [0.05, 0.45], functional forms are drawn from a mixture (linear 55 %, ReLU-based nonlinear 35 %, quadratic 10 %), noise is additive Gaussian with standard deviation in [0.05, 1.2], and anomaly injections follow a controlled distribution over single- and multi-node shifts with magnitudes in [0.8, 4.5] and affected-node fractions in [0.05, 0.25]. We have also added a quantitative independence analysis that computes graph-edit-distance distributions and maximum-mean-discrepancy scores between the meta-training prior and the PetShop/CausRCA data-generating processes, confirming that the evaluation benchmarks lie outside the exact support of any individual training graph while remaining statistically compatible with the prior family. revision: yes

  2. Referee: [§4] §4 (Experiments, performance tables): The abstract and results claim competitive performance on synthetic and realistic benchmarks without visible error bars, ablation studies on prior hyperparameters, or explicit data-exclusion rules. This makes it impossible to verify whether outperformance over graph-unaware methods and parity with graph-aware methods is statistically reliable, undermining support for the zero-shot inference claim.

    Authors: We acknowledge that the original experimental section lacked sufficient statistical detail. The revised §4 now includes error bars (mean ± one standard deviation over five independent random seeds for synthetic benchmarks and three seeds for PetShop/CausRCA) on every reported metric. We have inserted a new ablation subsection that varies the two most influential prior hyperparameters—the number of meta-training graphs (tested at 5 k, 10 k, 20 k) and the edge-density range—demonstrating that performance remains stable within the chosen operating regime. Finally, we have added an explicit “Data splits and exclusion” paragraph that states: synthetic test graphs are generated with topological features absent from the meta-training set, and realistic-dataset splits are strictly temporal (baseline period for meta-training, anomalous period held out for evaluation) to preclude leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in meta-learning derivation

full rationale

The paper frames RCA as Bayesian inference over a synthetic prior of causal models via a Model-Averaged Causal Estimation transformer, trained under the simulation-based meta-learning paradigm. It then performs zero-shot inference on observational/anomalous samples for systems up to 100 variables. Competitive empirical results are reported on external benchmarks PetShop and CausRCA, with no quoted equations or steps showing that the marginalization or implicit structure recovery reduces by construction to fitted parameters from those benchmarks. The synthetic prior generation is presented as independent of the evaluation domains, and no load-bearing self-citation chain or self-definitional reduction is exhibited in the provided text. This is a standard meta-learning setup with external validation, yielding a self-contained derivation against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the representativeness of the synthetic causal-model prior and on the transformer architecture's ability to perform implicit structure marginalization; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Synthetic causal models drawn for meta-training are distributionally close enough to real target systems that marginalization yields useful posterior inferences.
    The Bayesian framing and zero-shot claim depend on this transfer from synthetic prior to real data.

pith-pipeline@v0.9.0 · 5749 in / 1304 out tokens · 49700 ms · 2026-05-19T17:50:16.290914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    Bruinsma, and Richard E

    Matthew Ashman, Cristiana Diaconu, Junhyuck Kim, Lakee Sivaraya, Stratis Markou, James Requeima, Wessel P. Bruinsma, and Richard E. Turner. Translation equivariant transformer neural processes. InProceedings of the 41st International Conference on Machine Learning, ICML’24, Vienna, Austria, 2024. JMLR.org

  2. [2]

    Causal chain analysis and root causes: the giwa approach

    Juan Carlos Belausteguigoitia. Causal chain analysis and root causes: the giwa approach. AMBIO: A Journal of the Human Environment, 33(1):7–12, 2004

  3. [3]

    Why did the dis- tribution change? InInternational Conference on Artificial Intelligence and Statistics, pages 1666–1674

    Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, and Hoiyi Ng. Why did the dis- tribution change? InInternational Conference on Artificial Intelligence and Statistics, pages 1666–1674. PMLR, 2021

  4. [4]

    Causal structure- based root cause analysis of outliers

    Kailash Budhathoki, Lenon Minorics, Patrick Bloebaum, and Dominik Janzing. Causal structure- based root cause analysis of outliers. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Resear...

  5. [5]

    Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems

    Pengfei Chen, Yong Qi, Pengfei Zheng, and Di Hou. Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems. InIEEE INFOCOM 2014-IEEE Conference on Computer Communications, pages 1887–1895. IEEE, 2014

  6. [6]

    Bcd nets: Scalable variational approaches for bayesian causal discovery.Advances in Neural Information Processing Systems, 34:7095–7110, 2021

    Chris Cundy, Aditya Grover, and Stefano Ermon. Bcd nets: Scalable variational approaches for bayesian causal discovery.Advances in Neural Information Processing Systems, 34:7095–7110, 2021

  7. [7]

    A meta-learning approach to bayesian causal discovery

    Anish Dhir, Matthew Ashman, James Requeima, and Mark van der Wilk. A meta-learning approach to bayesian causal discovery. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 14158– 14178, 2025. URL https://proceedings.iclr.cc/paper_files/paper/2025/file/ 24faedc5853648d5857f2cf0...

  8. [8]

    Turner, and Mark van der Wilk

    Anish Dhir, Cristiana Diaconu, Valentinian Mihai Lungu, James Requeima, Richard E. Turner, and Mark van der Wilk. Estimating interventional distributions with uncertain causal graphs through meta-learning. InThe Thirty-ninth Annual Conference on Neural Information Process- ing Systems, 2025. URLhttps://openreview.net/forum?id=IQlcfc40Ja

  9. [9]

    Contin- uous bayesian model selection for multivariate causal discovery

    Anish Dhir, Ruby Sedgwick, Avinash Kori, Ben Glocker, and Mark van der Wilk. Contin- uous bayesian model selection for multivariate causal discovery. InForty-second Interna- tional Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=zydNWJzoVd

  10. [10]

    Interventions and causal inference.Philosophy of science, 74(5):981–995, 2007

    Frederick Eberhardt and Richard Scheines. Interventions and causal inference.Philosophy of science, 74(5):981–995, 2007

  11. [11]

    Latent bottlenecked attentive neural processes

    Leo Feng, Hossein Hajimirsadeghi, Yoshua Bengio, and Mohamed Osama Ahmed. Latent bottlenecked attentive neural processes. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=yIxtevizEA

  12. [12]

    Parameter priors for directed acyclic graphical models and the characterization of several probability distributions.The Annals of Statistics, 30(5):1412– 1440, October 2002

    Dan Geiger and David Heckerman. Parameter priors for directed acyclic graphical models and the characterization of several probability distributions.The Annals of Statistics, 30(5):1412– 1440, October 2002. ISSN 0090-5364, 2168-8966. doi: 10.1214/aos/1035844981. URL https: //projecteuclid.org/journals/annals-of-statistics/volume-30/issue-5/ Parameter-prio...

  13. [13]

    The petshop dataset — finding causes of performance issues across microservices

    Michaela Hardt, William Roy Orchard, Patrick Blöbaum, Elke Kirschbaum, and Shiva Ka- siviswanathan. The petshop dataset — finding causes of performance issues across microservices. In Francesco Locatello and Vanessa Didelez, editors,Proceedings of the Third Conference on 11 Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Resear...

  14. [14]

    Tabpfn: A transformer that solves small tabular classification problems in a second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, 2023

  15. [15]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025. ISSN 1476-

  16. [16]

    Accurate predictions on small data with a tabular foundation model,

    doi: 10.1038/s41586-024-08328-6. URL https://www.nature.com/articles/ s41586-024-08328-6

  17. [17]

    B., M¨uller, S., Salinas, D., and Hutter, F

    Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. From tables to time: Extending tabpfn-v2 to time series forecasting.arXiv preprint arXiv:2501.02945, 2025

  18. [18]

    Root cause analysis of failures in microservices through causal discovery.Advances in Neural Information Processing Systems, 35:31158–31170, 2022

    Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. Root cause analysis of failures in microservices through causal discovery.Advances in Neural Information Processing Systems, 35:31158–31170, 2022

  19. [19]

    Root cause analysis of failures from partial causal structures

    Azam Ikram, Kenneth Lee, Shubham Agarwal, Shiv Kumar Saini, Saurabh Bagchi, and Murat Kocaoglu. Root cause analysis of failures from partial causal structures. InThe 41st Conference on Uncertainty in Artificial Intelligence, 2025. URL https://openreview.net/forum?id= 5HeODZrG9E

  20. [20]

    Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learn- ing

    Amin Jaber, Murat Kocaoglu, Karthikeyan Shanmugam, and Elias Bareinboim. Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learn- ing. InAdvances in Neural Information Processing Systems, volume 33, pages 9551–

  21. [21]

    URL https://papers.nips.cc/paper/2020/hash/ 6cd9313ed34ef58bad3fdd504355e72c-Abstract.html

    Curran Associates, Inc., 2020. URL https://papers.nips.cc/paper/2020/hash/ 6cd9313ed34ef58bad3fdd504355e72c-Abstract.html

  22. [22]

    A sustainability root cause analysis methodology and its application.Computers & chemical engineering, 35 (12):2786–2798, 2011

    Abhishek Jayswal, Xiang Li, Anand Zanwar, Helen H Lou, and Yinlun Huang. A sustainability root cause analysis methodology and its application.Computers & chemical engineering, 35 (12):2786–2798, 2011

  23. [23]

    Root cause discovery via permutations and cholesky decomposition.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf066, 2025

    Jinzhou Li, Benjamin B Chu, Ines F Scheller, Julien Gagneur, and Marloes H Maathuis. Root cause discovery via permutations and cholesky decomposition.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf066, 2025

  24. [24]

    and Staar, Peter , title =

    Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. Causal inference-based root cause analysis for online service systems with intervention recognition. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 3230–3240, New York, NY , USA, 2022. Association for Computing Mac...

  25. [25]

    MetaRCA: A Generalizable Root Cause Analysis Framework for Cloud-Native Systems Powered by Meta Causal Knowledge, March 2026

    Shuai Liang, Pengfei Chen, Bozhe Tian, Gou Tan, Maohong Xu, Youjun Qu, Yahui Zhao, Yiduo Shang, and Chongkang Tan. MetaRCA: A Generalizable Root Cause Analysis Framework for Cloud-Native Systems Powered by Meta Causal Knowledge, March 2026. URL http: //arxiv.org/abs/2603.02032. arXiv:2603.02032 [cs]

  26. [26]

    Microscope: Pinpoint performance issues with causal graphs in micro-service environments

    JinJin Lin, Pengfei Chen, and Zibin Zheng. Microscope: Pinpoint performance issues with causal graphs in micro-service environments. InInternational Conference on Service-Oriented Computing, pages 3–20. Springer, 2018

  27. [27]

    Microhecl: High-efficient root cause localization in large-scale microservice systems

    Dewei Liu, Chuan He, Xin Peng, Fan Lin, Chenxi Zhang, Shengfang Gong, Ziang Li, Jiayu Ou, and Zheshun Wu. Microhecl: High-efficient root cause localization in large-scale microservice systems. In2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 338–347. IEEE, 2021

  28. [28]

    Licence to scale: A microservice simulation environment for benchmarking agentic ai

    Christopher Lohse, Adrian Selk, Amadou Ba, Jonas Wahl, and Marco Ruffini. Licence to scale: A microservice simulation environment for benchmarking agentic ai. Accepted at NeurIPS 2025 Workshop Scaling Environments for Agents (SEA), December 2025. 12

  29. [29]

    Dibs: Differentiable bayesian structure learning.Advances in Neural Information Processing Systems, 34:24111– 24123, 2021

    Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. Dibs: Differentiable bayesian structure learning.Advances in Neural Information Processing Systems, 34:24111– 24123, 2021

  30. [30]

    Patterns of unexpected in-hospital deaths: a root cause analysis.Patient safety in surgery, 5(1):3, 2011

    Lawrence A Lynn and J Paul Curry. Patterns of unexpected in-hospital deaths: a root cause analysis.Patient safety in surgery, 5(1):3, 2011

  31. [31]

    Enabling Joint Benchmarking of Automated Root Cause Analysis and Causal Discovery in Manufac- turing Using thecausRCADataset.Procedia CIRP, 139:114–120, January 2026

    Carl Willy Mehling, Sven Pieper, Tobias Lüke, Julius Döbelt, and Steffen Ihlenfeldt. Enabling Joint Benchmarking of Automated Root Cause Analysis and Causal Discovery in Manufac- turing Using thecausRCADataset.Procedia CIRP, 139:114–120, January 2026. ISSN 2212-8271. doi: 10.1016/j.procir.2025.09.010. URL https://www.sciencedirect.com/ science/article/pii...

  32. [32]

    Transformers can do bayesian inference

    Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=KSugKcbNf9

  33. [33]

    Transformer neural processes: Uncertainty-aware meta learning via sequence modeling.ArXiv, abs/2207.04179, 2022

    Tung Nguyen and Aditya Grover. Transformer neural processes: Uncertainty-aware meta learning via sequence modeling.ArXiv, abs/2207.04179, 2022. URL https://api. semanticscholar.org/CorpusID:250340974

  34. [34]

    Root cause analysis of outliers with missing structural knowledge

    William Roy Orchard, Nastaran Okati, Sergio Hernan Garrido Mejia, Patrick Blöbaum, and Dominik Janzing. Root cause analysis of outliers with missing structural knowledge. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=7Nxq4RQApu

  35. [35]

    The pagerank citation ranking: Bringing order to the web

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford infolab, 1999

  36. [36]

    Causal diagrams for empirical research.Biometrika, 82(4):669–688, 1995

    Judea Pearl. Causal diagrams for empirical research.Biometrika, 82(4):669–688, 1995

  37. [37]

    Cambridge university press, 2009

    Judea Pearl.Causality. Cambridge university press, 2009

  38. [38]

    Luan Pham, Huong Ha, and Hongyu Zhang. Baro: Robust root cause analysis for microservices via multivariate bayesian online change point detection.Proceedings of the ACM on Software Engineering, 1(FSE):2214–2237, July 2024. ISSN 2994-970X. doi: 10.1145/3660805

  39. [39]

    Do-PFN: In-context learning for causal effect estimation

    Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-PFN: In-context learning for causal effect estimation. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2026. URL https: //openreview.net/forum?id=OaNbl9b56B

  40. [40]

    Root cause analysis of outliers in unknown cyclic graphs.arXiv preprint arXiv:2510.06995, 2025

    Daniela Schkoda and Dominik Janzing. Root cause analysis of outliers in unknown cyclic graphs.arXiv preprint arXiv:2510.06995, 2025

  41. [41]

    Impact of gene mutation in the develop- ment of parkinson’s disease.Genes & diseases, 6(2):120–128, 2019

    Suganya Selvaraj and Shanmughavel Piramanayagam. Impact of gene mutation in the develop- ment of parkinson’s disease.Genes & diseases, 6(2):120–128, 2019

  42. [42]

    ϵ-diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms

    Huasong Shan, Yuan Chen, Haifeng Liu, Yunpeng Zhang, Xiao Xiao, Xiaofeng He, Min Li, and Wei Ding. ϵ-diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms. InThe World Wide Web Conference, WWW ’19, page 3215–3222, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 978145036...

  43. [43]

    Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey.ACM Comput

    Jacopo Soldani and Antonio Brogi. Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey.ACM Comput. Surv., 55(3), February 2022. ISSN 0360-0300. doi: 10.1145/3501297. URLhttps://doi.org/10.1145/3501297

  44. [44]

    Causal discovery and inference: concepts and recent methodolog- ical advances

    Peter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodolog- ical advances. InApplied informatics, volume 3, pages 1–28. Springer, 2016

  45. [45]

    MIT press, 2000

    Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000. 13

  46. [46]

    Permutation-based causal structure learning with unknown intervention targets

    Chandler Squires, Yuhao Wang, and Caroline Uhler. Permutation-based causal structure learning with unknown intervention targets. InConference on Uncertainty in Artificial Intelligence, pages 1039–1048. PMLR, 2020

  47. [47]

    Scalable intervention target estimation in linear models

    Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, and Ali Tajer. Scalable intervention target estimation in linear models. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 1494–1505. Curran Associates, Inc., 2021. URLhttps://proceedings.neurip...

  48. [48]

    Intervention target estimation in the presence of latent variables

    Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, and Ali Tajer. Intervention target estimation in the presence of latent variables. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 ofProceedings of Machine Learning Research, pages 2013–2023. PMLR, 01–05 Aug 2...

  49. [49]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  50. [50]

    Characterizing and learning equivalence classes of causal dags under interventions

    Karren Yang, Abigail Katcoff, and Caroline Uhler. Characterizing and learning equivalence classes of causal dags under interventions. InInternational Conference on Machine Learning, pages 5541–5550. PMLR, 2018

  51. [51]

    Learning unknown intervention targets in structural causal models from heterogeneous data

    Yuqin Yang, Saber Salehkaleybar, and Negar Kiyavash. Learning unknown intervention targets in structural causal models from heterogeneous data. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors,Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 ofProceedings of Machine Learning Research, pages ...

  52. [52]

    Sample functional mechanisms

  53. [53]

    Sample observational data Sample in topological order:

  54. [54]

    Hard do-intervention (stuck sensor) Pin node to a constant:

    Select intervention node(s) Prefer non-leaf nodes (nodes with at least one causal successor) OR 5b. Hard do-intervention (stuck sensor) Pin node to a constant:

  55. [55]

    stuck sensor

    Sample interventional data Sample from modified SCM in topological order 5a. Soft intervention (weight, function, sign or noisechange) Modify SCM Figure 5: Overview of the data generation process. A causal graph G and functional mechanisms f are sampled to define an SCM. Observational data Dobs is drawn by ancestral sampling, after which a target node tj ...

  56. [56]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...