Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference
Pith reviewed 2026-05-09 21:02 UTC · model grok-4.3
The pith
Incorporating a known directed acyclic graph into diffusion model training enables interventional sampling and directed edge testing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The causality-encoded diffusion framework trains conditional diffusion models consistent with a known directed acyclic graph's factorization, approximately recovering the observational distribution while enabling interventional sampling by fixing intervened variables and propagating effects during reverse diffusion; this simulator further supports a resampling-based test for directed edges with asymptotic type I error control.
What carries the argument
Causality-encoded diffusion framework: conditional diffusion models trained to respect the factorization of a known directed acyclic graph, allowing graph-guided propagation during reverse diffusion.
Load-bearing premise
The input directed acyclic graph is correctly specified and the conditional diffusion models can be trained to exactly match its factorization.
What would settle it
In a simulated causal system with known ground-truth interventions, the generated interventional distributions from the model fail to match the true ones computed directly from the data-generating process.
Figures
read the original abstract
Standard diffusion models are flexible estimators of complex distributions, but they do not encode causal structures and therefore do not by themselves support causal analysis. We propose a causality-encoded diffusion framework that incorporates a known directed acyclic graph by training conditional diffusion models consistent with the graph factorisation. The resulting sampler approximately recovers the observational distribution and enables interventional sampling by fixing intervened variables while propagating effects through the graph during reverse diffusion. Building on this interventional simulator, we develop a resampling-based test for directed edges that generates null replicates under a candidate graph. We establish convergence guarantees for observational and interventional distribution estimation, with rates governed by the maximum local dimension rather than the ambient dimension, and prove asymptotic control of type I error for the edge test. Simulations show improved interventional distribution recovery relative to baselines, with near-nominal size and favourable power in inference. An application to flow cytometry data demonstrates practical utility of the proposed method in assessing disputed signalling linkages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a causality-encoded diffusion framework that incorporates a known DAG by training conditional diffusion models consistent with the graph factorization. This yields a sampler that approximately recovers the observational distribution and supports interventional sampling by fixing intervened variables and propagating effects through the graph during reverse diffusion. It further develops a resampling-based test for directed edges that generates null replicates under a candidate graph, with claimed convergence guarantees (rates governed by max local dimension) and asymptotic type I error control, supported by simulations showing improved interventional recovery and an application to flow cytometry data.
Significance. If the implementation details hold, the approach could offer a flexible, high-dimensional method for causal queries by leveraging diffusion models' expressivity while enforcing causal structure, with the local-dimension convergence rates providing a theoretical advantage over ambient-dimension methods. The combination of interventional simulation and edge testing in one framework is potentially useful for domains like biology where DAGs are partially known.
major comments (2)
- [§3 (method description of reverse process and interventional sampling)] The central interventional sampling claim (abstract and §3) requires that fixing an intervened variable and propagating during reverse diffusion recovers the correct interventional distribution. However, the description does not specify whether conditional diffusion models are trained and sampled using noisy parent values at each timestep t to enforce the factorization at all noise levels, or only on clean data. If the latter, early denoising steps for descendants would ignore the intervention, breaking causal consistency even if observational recovery holds. This is load-bearing for the interventional and edge-inference claims.
- [§4 (theoretical results)] The convergence guarantees and type I error control (abstract and §4) are stated without explicit assumptions on the training of the conditional models or the topological ordering enforcement during sampling. The rates depending on max local dimension rather than ambient dimension are promising but need the precise statement of how the graph factorization is maintained across the diffusion chain to be verifiable.
minor comments (2)
- [§5 (simulations)] The simulation section should include more detail on how baselines were implemented and whether they also used the known DAG for fair comparison.
- [§3] Notation for the noise schedule and conditioning variables could be clarified with an explicit algorithm box for the interventional reverse process.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments, which have helped us strengthen the clarity of the manuscript. We address each major comment below and have revised the relevant sections to provide the requested details on the training and sampling procedures.
read point-by-point responses
-
Referee: [§3 (method description of reverse process and interventional sampling)] The central interventional sampling claim (abstract and §3) requires that fixing an intervened variable and propagating during reverse diffusion recovers the correct interventional distribution. However, the description does not specify whether conditional diffusion models are trained and sampled using noisy parent values at each timestep t to enforce the factorization at all noise levels, or only on clean data. If the latter, early denoising steps for descendants would ignore the intervention, breaking causal consistency even if observational recovery holds. This is load-bearing for the interventional and edge-inference claims.
Authors: We thank the referee for identifying this key implementation detail. In our framework the conditional diffusion models are trained by feeding noisy parent values (obtained from the forward process at the same timestep t) as conditioning inputs, and the same noisy conditioning is used during reverse sampling. This enforces the graph factorization at every noise level and ensures that interventions on ancestors propagate correctly even in early denoising steps for descendants. We have revised §3 to include explicit pseudocode for both training and interventional sampling, together with a paragraph stating that noisy parents are used at each t. revision: yes
-
Referee: [§4 (theoretical results)] The convergence guarantees and type I error control (abstract and §4) are stated without explicit assumptions on the training of the conditional models or the topological ordering enforcement during sampling. The rates depending on max local dimension rather than ambient dimension are promising but need the precise statement of how the graph factorization is maintained across the diffusion chain to be verifiable.
Authors: We agree that the assumptions should be stated more explicitly. The convergence rates in §4 are derived under the assumption that each conditional score model is trained to the true conditional distribution given noisy parents, with sampling performed in topological order so that only ancestor values (noisy or fixed) are used as conditioning. Because each local model operates only on the parents of a node, the error bounds depend on the maximum local dimension rather than the ambient dimension. We have added a new subsection in §4 that lists these assumptions and explains how the factorization is preserved at every timestep of the diffusion chain. revision: yes
Circularity Check
No circularity: external DAG input and standard conditional training yield interventional propagation by construction
full rationale
The paper takes a known DAG as an external input and trains conditional diffusion models to be consistent with its factorization. Interventional sampling is then performed by fixing intervened nodes and propagating through the graph in topological order during the reverse process. This follows directly from the training objective and the supplied graph without any reduction of claimed results to quantities defined only by fitted parameters or self-citations. Convergence rates and type-I error control are stated as standard statistical guarantees under the given assumptions. No self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation steps appear in the abstract or described derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A correctly specified directed acyclic graph is available as input.
- domain assumption Conditional diffusion models can be trained to be consistent with the graph factorisation.
invented entities (1)
-
Causality-encoded diffusion framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015
work page 2015
-
[2]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33, pages 6840–6851, 2020
work page 2020
-
[3]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021
work page 2021
-
[4]
Ergan Shang, Yuting Wei, and Kathryn Roeder. Predicting the unseen: a diffusion-based de- biasing framework for transcriptional response prediction at single-cell resolution.Proceedings of the National Academy of Sciences, 122(52):e2525268122, 2025
work page 2025
-
[5]
Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982
work page 1982
-
[6]
A likelihood based approach to distribution regression using conditional deep generative models
Shivam Kumar, Yun Yang, and Lizhen Lin. A likelihood based approach to distribution regression using conditional deep generative models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 31964–31990, 2025
work page 2025
-
[7]
Cambridge university press, 2009
Judea Pearl.Causality. Cambridge university press, 2009
work page 2009
-
[8]
Dimakis, and Sriram Vishwanath
Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, 2018
work page 2018
-
[9]
Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach
Amir-Hossein Karimi, Julius von K¨ ugelgen, Bernhard Sch¨ olkopf, and Isabel Valera. Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach. InAdvances in Neural Information Processing Systems 33, pages 265–277, 2020
work page 2020
-
[10]
Deep structural causal models for tractable counterfactual inference
Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems 33, pages 857–869, 2020
work page 2020
-
[11]
Patrick Chao, Patrick Bl¨ obaum, Sapan Kirit Patel, and Shiva Prasad Kasiviswanathan. Modeling causal mechanisms with diffusion models for interventional and counterfactual queries.Transactions on Machine Learning Research, 2024. 28
work page 2024
-
[12]
Zhihan Huang, Yuting Wei, and Yuxin Chen. Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.Mathematics of Operations Research, 2026
work page 2026
-
[13]
Gen Li and Yuling Yan. O (d/t) convergence theory for diffusion probabilistic models under minimal assumptions.Journal of Machine Learning Research, 26(292):1–55, 2025
work page 2025
-
[14]
Linear convergence of diffusion models under the manifold hypothesis
Peter Potaptchik, Iskander Azangulov, and George Deligiannidis. Linear convergence of diffusion models under the manifold hypothesis. InProceedings of Thirty Eighth Conference on Learning Theory, volume 291 ofProceedings of Machine Learning Research, pages 4668–4685, 2025
work page 2025
-
[15]
Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estima- tion and distribution recovery of diffusion models on low-dimensional data. InInternational Conference on Machine Learning, pages 4672–4712. PMLR, 2023
work page 2023
-
[16]
Adaptivity of diffusion models to manifold structures
Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International conference on artificial intelligence and statistics, pages 1648–1656. PMLR, 2024
work page 2024
-
[17]
Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024
Xinyu Tian and Xiaotong Shen. Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024
-
[18]
Inference in high-dimensional graphical models
Jana Jankov´ a and Sara van de Geer. Inference in high-dimensional graphical models. In Handbook of graphical models, pages 325–350. CRC Press, 2018
work page 2018
-
[19]
Chunlin Li, Xiaotong Shen, and Wei Pan. Likelihood ratio tests for a large directed acyclic graph.Journal of the American Statistical Association, 115(531):1304–1319, 2020
work page 2020
-
[20]
Li Chen, Chunlin Li, Xiaotong Shen, and Wei Pan. Discovery and inference of a causal network with hidden confounding.Journal of the American Statistical Association, 119(548): 2572–2584, 2024
work page 2024
-
[21]
Chengchun Shi, Yunzhe Zhou, and Lexin Li. Testing directed acyclic graph via structural, su- pervised and generative adversarial learning.Journal of the American Statistical Association, 119(547):1833–1846, 2024
work page 2024
-
[22]
Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:‘model- x’knockoffs for high dimensional controlled variable selection.Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018
work page 2018
-
[23]
Thomas B Berrett, Yi Wang, Rina Foygel Barber, and Richard J Samworth. The conditional permutation test for independence while controlling for confounders.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020
work page 2020
-
[24]
Wesley Tansey, Victor Veitch, Haoran Zhang, Raul Rabadan, and David M. Blei. The holdout randomization test for feature selection in black box models.Journal of Computational and Graphical Statistics, 31(1):151–162, 2022. 29
work page 2022
-
[25]
causality-encoded diffusion models for interventional sampling and edge inference
Li Chen, Xiaotong Shen, and Wei Pan. Supplement to “causality-encoded diffusion models for interventional sampling and edge inference”, 2026
work page 2026
- [26]
-
[27]
arXiv preprint arXiv:2403.11968 , year=
Hengyu Fu, Zhuoran Yang, Mengdi Wang, and Minshuo Chen. Unveil conditional dif- fusion models with classifier-free guidance: A sharp statistical theory.arXiv preprint arXiv:2403.11968, 2024
-
[28]
Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011
work page 2011
-
[29]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012
work page 2012
-
[30]
Sture Holm. A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6(2):65–70, 1979
work page 1979
-
[31]
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995
work page 1995
-
[32]
A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021
Mona Azadkia and Sourav Chatterjee. A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021
work page 2021
-
[33]
Jonathan Ansari and Sebastian Fuchs. A simple extension of azadkia & chatterjee’s rank correlation to multi-response vectors.arXiv preprint arXiv:2212.01621, 2022
-
[34]
Zhen Huang, Nabarun Deb, and Bodhisattva Sen. Kernel partial correlation coefficient—a measure of conditional dependence.Journal of Machine Learning Research, 23(216):1–58, 2022
work page 2022
-
[35]
On the power of chatterjee’s rank correlation
Hongjian Shi, Mathias Drton, and Fang Han. On the power of chatterjee’s rank correlation. Biometrika, 109(2):317–333, 2022
work page 2022
-
[36]
On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023
Zhexiao Lin and Fang Han. On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023
work page 2023
-
[37]
A simple bootstrap for chatterjee’s rank correlation
Holger Dette and Marius Kroll. A simple bootstrap for chatterjee’s rank correlation. Biometrika, 112(1):asae045, 2025
work page 2025
-
[38]
Reward- directed conditional diffusion: Provable distribution estimation and reward improvement
Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, and Mengdi Wang. Reward- directed conditional diffusion: Provable distribution estimation and reward improvement. In Advances in Neural Information Processing Systems 36, pages 60599–60635, 2023
work page 2023
-
[39]
Repaint: Inpainting using denoising diffusion probabilistic models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 30
work page 2022
-
[40]
Vaca: Designing variational graph autoencoders for causal queries
Pablo S´ anchez-Mart´ ın, Miriam Rateike, and Isabel Valera. Vaca: Designing variational graph autoencoders for causal queries. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8159–8168, 2022
work page 2022
-
[41]
Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005
work page 2005
-
[42]
Dimakis, and Sanjay Shakkottai
Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, and Sanjay Shakkottai. Model-powered conditional independence test. InAdvances in Neural Information Processing Systems 30, 2017
work page 2017
-
[43]
K-nearest-neighbor local sampling based conditional independence testing
Shuai Li, Yingjie Zhang, Hongtu Zhu, Christina Wang, Hai Shu, Ziqi Chen, Zhuoran Sun, and Yanfeng Yang. K-nearest-neighbor local sampling based conditional independence testing. InAdvances in Neural Information Processing Systems 36, pages 23321–23344, 2023
work page 2023
-
[44]
Eric V Strobl, Kun Zhang, and Shyam Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7 (1):20180017, 2019
work page 2019
-
[45]
Kernel-based condi- tional independence test and application in causal discovery
Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based condi- tional independence test and application in causal discovery. InProceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813. AUAI Press, 2011
work page 2011
-
[46]
Conditional diffusion models based conditional independence testing
Yanfeng Yang, Shuai Li, Yingjie Zhang, Zhuoran Sun, Hai Shu, Ziqi Chen, and Renming Zhang. Conditional diffusion models based conditional independence testing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 22020–22028, 2025
work page 2025
-
[47]
Score-based generative modeling for conditional independence testing
Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, and Shuigeng Zhou. Score-based generative modeling for conditional independence testing. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2410–2419, 2025
work page 2025
-
[48]
Joris M. Mooij and Tom Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439. AUAI Press, 2013
work page 2013
-
[49]
Sung Dae Kim, Ji Sue Baik, Jae-Hye Lee, Seo-Won Mun, Joo Mi Yi, and Moon-Taek Park. The malignancy of liver cancer cells is increased by il-4/erk/akt signaling axis activity triggered by irradiated endothelial cells.Journal of Radiation Research, 61(3):376–387, 2020
work page 2020
-
[50]
Syed J Khundmiri, Vishal Amin, Jeff Henson, John Lewis, Mohamed Ameen, Madhavi J Rane, and Nicholas A Delamere. Ouabain stimulates protein kinase b (akt) phosphorylation in opossum kidney proximal tubule cells through an erk-dependent pathway.American Journal of Physiology-Cell Physiology, 293(3):C1171–C1180, 2007. 31
work page 2007
-
[51]
Aurimas Stulpinas, Matas Sereika, Aida Vitkeviciene, Ausra Imbrasaite, Natalija Krestnikova, and Audrone V Kalvelyte. Crosstalk between protein kinases akt and erk1/2 in human lung tumor-derived cell models.Frontiers in Oncology, 12:1045521, 2023. 32
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.