Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference

Li Chen; Wei Pan; Xiaotong Shen

arxiv: 2604.21843 · v1 · submitted 2026-04-23 · 📊 stat.ME · stat.ML

Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference

Li Chen , Xiaotong Shen , Wei Pan This is my paper

Pith reviewed 2026-05-09 21:02 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords causal inferencediffusion modelsinterventional samplingdirected acyclic graphsedge inferenceresampling testconvergence rates

0 comments

The pith

Incorporating a known directed acyclic graph into diffusion model training enables interventional sampling and directed edge testing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard diffusion models estimate complex distributions but lack causal structure. This work trains conditional diffusion models to be consistent with the factorization of a supplied directed acyclic graph. The resulting model recovers the observational distribution and supports interventions by fixing variables and diffusing effects through the graph. A resampling procedure then tests for specific directed edges using null samples generated under candidate graphs. Theoretical results include convergence guarantees and type I error control, with simulations and flow cytometry data showing practical gains.

Core claim

The causality-encoded diffusion framework trains conditional diffusion models consistent with a known directed acyclic graph's factorization, approximately recovering the observational distribution while enabling interventional sampling by fixing intervened variables and propagating effects during reverse diffusion; this simulator further supports a resampling-based test for directed edges with asymptotic type I error control.

What carries the argument

Causality-encoded diffusion framework: conditional diffusion models trained to respect the factorization of a known directed acyclic graph, allowing graph-guided propagation during reverse diffusion.

Load-bearing premise

The input directed acyclic graph is correctly specified and the conditional diffusion models can be trained to exactly match its factorization.

What would settle it

In a simulated causal system with known ground-truth interventions, the generated interventional distributions from the model fail to match the true ones computed directly from the data-generating process.

Figures

Figures reproduced from arXiv: 2604.21843 by Li Chen, Wei Pan, Xiaotong Shen.

**Figure 1.** Figure 1: Simulation results on the Chain, Hub, Random, and Sachs structures. Boxplots compare squared MMD between the model-generated samples and 5,000 reference samples from the true interventional distribution across 50 independent repetitions for each training sample size. Lower values indicate better recovery of the target interventional law. Results. For each graph and each sample size n ∈ {500, 1000, 2000, 50… view at source ↗

**Figure 2.** Figure 2: Empirical rejection rates (size) at nominal level 0.05 for CEDMI and competing conditional [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Empirical power at nominal level 0.05 for testing directed edges under varying signal strength. [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Consensus network, according to [41]; (b) Reconstructed signalling network by [41]; (c) Super-DAG obtained by taking the union of (a) and (b), with the edge between PIP3 and PLCg oriented as PIP3 → PLCg. In panels (a) and (b), blue dashed arrows indicate edges on which the two networks disagree. Results. Let E0 denote the edge set of the super-DAG in [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Rejection rates for tests of the four disputed linkages in the flow cytometry network across [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

read the original abstract

Standard diffusion models are flexible estimators of complex distributions, but they do not encode causal structures and therefore do not by themselves support causal analysis. We propose a causality-encoded diffusion framework that incorporates a known directed acyclic graph by training conditional diffusion models consistent with the graph factorisation. The resulting sampler approximately recovers the observational distribution and enables interventional sampling by fixing intervened variables while propagating effects through the graph during reverse diffusion. Building on this interventional simulator, we develop a resampling-based test for directed edges that generates null replicates under a candidate graph. We establish convergence guarantees for observational and interventional distribution estimation, with rates governed by the maximum local dimension rather than the ambient dimension, and prove asymptotic control of type I error for the edge test. Simulations show improved interventional distribution recovery relative to baselines, with near-nominal size and favourable power in inference. An application to flow cytometry data demonstrates practical utility of the proposed method in assessing disputed signalling linkages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a way to bake a known DAG into diffusion training so interventions can be simulated by fixing nodes and propagating during reverse diffusion, plus a resampling edge test, but the causal consistency at each denoising step is the part that needs verification.

read the letter

The core move is training conditional diffusion models to respect the factorization of a supplied DAG, then using that to sample interventions by clamping intervened variables and letting effects flow through the graph in the reverse process. They also build a resampling procedure to test candidate edges by generating nulls under the graph. That combination is new enough to stand out from standard diffusion or causal work. The claimed rates depending on max local dimension instead of ambient dimension make sense for high-dimensional settings, and the flow cytometry example shows they can apply it to real data with disputed links. Simulations reportedly beat baselines on interventional recovery while keeping the edge test near nominal size with decent power. That is useful credit. The soft spot is exactly the one flagged in the stress test. For interventional samples to be correct, each variable's denoising step has to be conditioned on its parents' noisy versions at the matching noise level, in topological order. If the conditioning is only applied to clean data, early steps for descendants will ignore the intervention and the output distribution will not match the true interventional law even if the observational fit is good. The abstract states convergence and type-I control but gives no derivation details or implementation description, so it is impossible to tell whether they handled the noisy-parent conditioning properly. Everything also rests on the DAG being correct and available upfront. This is for people working on causal questions in complex, high-dimensional data where structure is at least partly known, such as biology or econometrics. A reader who already uses diffusion models or needs flexible interventional simulators would get concrete value from the method and the test. I would send it to peer review. The idea is specific enough to be checked, and the simulation results give something to evaluate even if the proofs need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes a causality-encoded diffusion framework that incorporates a known DAG by training conditional diffusion models consistent with the graph factorization. This yields a sampler that approximately recovers the observational distribution and supports interventional sampling by fixing intervened variables and propagating effects through the graph during reverse diffusion. It further develops a resampling-based test for directed edges that generates null replicates under a candidate graph, with claimed convergence guarantees (rates governed by max local dimension) and asymptotic type I error control, supported by simulations showing improved interventional recovery and an application to flow cytometry data.

Significance. If the implementation details hold, the approach could offer a flexible, high-dimensional method for causal queries by leveraging diffusion models' expressivity while enforcing causal structure, with the local-dimension convergence rates providing a theoretical advantage over ambient-dimension methods. The combination of interventional simulation and edge testing in one framework is potentially useful for domains like biology where DAGs are partially known.

major comments (2)

[§3 (method description of reverse process and interventional sampling)] The central interventional sampling claim (abstract and §3) requires that fixing an intervened variable and propagating during reverse diffusion recovers the correct interventional distribution. However, the description does not specify whether conditional diffusion models are trained and sampled using noisy parent values at each timestep t to enforce the factorization at all noise levels, or only on clean data. If the latter, early denoising steps for descendants would ignore the intervention, breaking causal consistency even if observational recovery holds. This is load-bearing for the interventional and edge-inference claims.
[§4 (theoretical results)] The convergence guarantees and type I error control (abstract and §4) are stated without explicit assumptions on the training of the conditional models or the topological ordering enforcement during sampling. The rates depending on max local dimension rather than ambient dimension are promising but need the precise statement of how the graph factorization is maintained across the diffusion chain to be verifiable.

minor comments (2)

[§5 (simulations)] The simulation section should include more detail on how baselines were implemented and whether they also used the known DAG for fair comparison.
[§3] Notation for the noise schedule and conditioning variables could be clarified with an explicit algorithm box for the interventional reverse process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which have helped us strengthen the clarity of the manuscript. We address each major comment below and have revised the relevant sections to provide the requested details on the training and sampling procedures.

read point-by-point responses

Referee: [§3 (method description of reverse process and interventional sampling)] The central interventional sampling claim (abstract and §3) requires that fixing an intervened variable and propagating during reverse diffusion recovers the correct interventional distribution. However, the description does not specify whether conditional diffusion models are trained and sampled using noisy parent values at each timestep t to enforce the factorization at all noise levels, or only on clean data. If the latter, early denoising steps for descendants would ignore the intervention, breaking causal consistency even if observational recovery holds. This is load-bearing for the interventional and edge-inference claims.

Authors: We thank the referee for identifying this key implementation detail. In our framework the conditional diffusion models are trained by feeding noisy parent values (obtained from the forward process at the same timestep t) as conditioning inputs, and the same noisy conditioning is used during reverse sampling. This enforces the graph factorization at every noise level and ensures that interventions on ancestors propagate correctly even in early denoising steps for descendants. We have revised §3 to include explicit pseudocode for both training and interventional sampling, together with a paragraph stating that noisy parents are used at each t. revision: yes
Referee: [§4 (theoretical results)] The convergence guarantees and type I error control (abstract and §4) are stated without explicit assumptions on the training of the conditional models or the topological ordering enforcement during sampling. The rates depending on max local dimension rather than ambient dimension are promising but need the precise statement of how the graph factorization is maintained across the diffusion chain to be verifiable.

Authors: We agree that the assumptions should be stated more explicitly. The convergence rates in §4 are derived under the assumption that each conditional score model is trained to the true conditional distribution given noisy parents, with sampling performed in topological order so that only ancestor values (noisy or fixed) are used as conditioning. Because each local model operates only on the parents of a node, the error bounds depend on the maximum local dimension rather than the ambient dimension. We have added a new subsection in §4 that lists these assumptions and explains how the factorization is preserved at every timestep of the diffusion chain. revision: yes

Circularity Check

0 steps flagged

No circularity: external DAG input and standard conditional training yield interventional propagation by construction

full rationale

The paper takes a known DAG as an external input and trains conditional diffusion models to be consistent with its factorization. Interventional sampling is then performed by fixing intervened nodes and propagating through the graph in topological order during the reverse process. This follows directly from the training objective and the supplied graph without any reduction of claimed results to quantities defined only by fitted parameters or self-citations. Convergence rates and type-I error control are stated as standard statistical guarantees under the given assumptions. No self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation steps appear in the abstract or described derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Central claim rests on availability of a correct known DAG and the feasibility of training graph-consistent conditional diffusions; no explicit free parameters listed but implicit in model fitting.

axioms (2)

domain assumption A correctly specified directed acyclic graph is available as input.
Method requires the DAG to define conditioning structure and propagation order.
domain assumption Conditional diffusion models can be trained to be consistent with the graph factorisation.
Required for the sampler to recover the observational distribution and support interventions.

invented entities (1)

Causality-encoded diffusion framework no independent evidence
purpose: Incorporate known causal graph into diffusion training for interventional sampling and edge inference.
New framework introduced to bridge diffusion models and causal structure.

pith-pipeline@v0.9.0 · 5454 in / 1325 out tokens · 56421 ms · 2026-05-09T21:02:02.888036+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

work page 2015
[2]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33, pages 6840–6851, 2020

work page 2020
[3]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[4]

Ergan Shang, Yuting Wei, and Kathryn Roeder. Predicting the unseen: a diffusion-based de- biasing framework for transcriptional response prediction at single-cell resolution.Proceedings of the National Academy of Sciences, 122(52):e2525268122, 2025

work page 2025
[5]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

work page 1982
[6]

A likelihood based approach to distribution regression using conditional deep generative models

Shivam Kumar, Yun Yang, and Lizhen Lin. A likelihood based approach to distribution regression using conditional deep generative models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 31964–31990, 2025

work page 2025
[7]

Cambridge university press, 2009

Judea Pearl.Causality. Cambridge university press, 2009

work page 2009
[8]

Dimakis, and Sriram Vishwanath

Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, 2018

work page 2018
[9]

Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach

Amir-Hossein Karimi, Julius von K¨ ugelgen, Bernhard Sch¨ olkopf, and Isabel Valera. Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach. InAdvances in Neural Information Processing Systems 33, pages 265–277, 2020

work page 2020
[10]

Deep structural causal models for tractable counterfactual inference

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems 33, pages 857–869, 2020

work page 2020
[11]

Modeling causal mechanisms with diffusion models for interventional and counterfactual queries.Transactions on Machine Learning Research, 2024

Patrick Chao, Patrick Bl¨ obaum, Sapan Kirit Patel, and Shiva Prasad Kasiviswanathan. Modeling causal mechanisms with diffusion models for interventional and counterfactual queries.Transactions on Machine Learning Research, 2024. 28

work page 2024
[12]

Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.Mathematics of Operations Research, 2026

Zhihan Huang, Yuting Wei, and Yuxin Chen. Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.Mathematics of Operations Research, 2026

work page 2026
[13]

O (d/t) convergence theory for diffusion probabilistic models under minimal assumptions.Journal of Machine Learning Research, 26(292):1–55, 2025

Gen Li and Yuling Yan. O (d/t) convergence theory for diffusion probabilistic models under minimal assumptions.Journal of Machine Learning Research, 26(292):1–55, 2025

work page 2025
[14]

Linear convergence of diffusion models under the manifold hypothesis

Peter Potaptchik, Iskander Azangulov, and George Deligiannidis. Linear convergence of diffusion models under the manifold hypothesis. InProceedings of Thirty Eighth Conference on Learning Theory, volume 291 ofProceedings of Machine Learning Research, pages 4668–4685, 2025

work page 2025
[15]

Score approximation, estima- tion and distribution recovery of diffusion models on low-dimensional data

Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estima- tion and distribution recovery of diffusion models on low-dimensional data. InInternational Conference on Machine Learning, pages 4672–4712. PMLR, 2023

work page 2023
[16]

Adaptivity of diffusion models to manifold structures

Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International conference on artificial intelligence and statistics, pages 1648–1656. PMLR, 2024

work page 2024
[17]

Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024

Xinyu Tian and Xiaotong Shen. Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024

work page arXiv 2024
[18]

Inference in high-dimensional graphical models

Jana Jankov´ a and Sara van de Geer. Inference in high-dimensional graphical models. In Handbook of graphical models, pages 325–350. CRC Press, 2018

work page 2018
[19]

Likelihood ratio tests for a large directed acyclic graph.Journal of the American Statistical Association, 115(531):1304–1319, 2020

Chunlin Li, Xiaotong Shen, and Wei Pan. Likelihood ratio tests for a large directed acyclic graph.Journal of the American Statistical Association, 115(531):1304–1319, 2020

work page 2020
[20]

Discovery and inference of a causal network with hidden confounding.Journal of the American Statistical Association, 119(548): 2572–2584, 2024

Li Chen, Chunlin Li, Xiaotong Shen, and Wei Pan. Discovery and inference of a causal network with hidden confounding.Journal of the American Statistical Association, 119(548): 2572–2584, 2024

work page 2024
[21]

Testing directed acyclic graph via structural, su- pervised and generative adversarial learning.Journal of the American Statistical Association, 119(547):1833–1846, 2024

Chengchun Shi, Yunzhe Zhou, and Lexin Li. Testing directed acyclic graph via structural, su- pervised and generative adversarial learning.Journal of the American Statistical Association, 119(547):1833–1846, 2024

work page 2024
[22]

Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:‘model- x’knockoffs for high dimensional controlled variable selection.Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018

work page 2018
[23]

The conditional permutation test for independence while controlling for confounders.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020

Thomas B Berrett, Yi Wang, Rina Foygel Barber, and Richard J Samworth. The conditional permutation test for independence while controlling for confounders.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020

work page 2020
[24]

Wesley Tansey, Victor Veitch, Haoran Zhang, Raul Rabadan, and David M. Blei. The holdout randomization test for feature selection in black box models.Journal of Computational and Graphical Statistics, 31(1):151–162, 2022. 29

work page 2022
[25]

causality-encoded diffusion models for interventional sampling and edge inference

Li Chen, Xiaotong Shen, and Wei Pan. Supplement to “causality-encoded diffusion models for interventional sampling and edge inference”, 2026

work page 2026
[26]

Springer, 2006

Larry Wasserman.All of nonparametric statistics. Springer, 2006

work page 2006
[27]

arXiv preprint arXiv:2403.11968 , year=

Hengyu Fu, Zhuoran Yang, Mengdi Wang, and Minshuo Chen. Unveil conditional dif- fusion models with classifier-free guidance: A sharp statistical theory.arXiv preprint arXiv:2403.11968, 2024

work page arXiv 2024
[28]

A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

work page 2011
[29]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012

work page 2012
[30]

A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6(2):65–70, 1979

Sture Holm. A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6(2):65–70, 1979

work page 1979
[31]

Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995

work page 1995
[32]

A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021

Mona Azadkia and Sourav Chatterjee. A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021

work page 2021
[33]

A simple extension of azadkia and chatterjee’s rank correlation to a vector of endogenous variables.arXiv preprint arXiv:2212.01621, 2022

Jonathan Ansari and Sebastian Fuchs. A simple extension of azadkia & chatterjee’s rank correlation to multi-response vectors.arXiv preprint arXiv:2212.01621, 2022

work page arXiv 2022
[34]

Kernel partial correlation coefficient—a measure of conditional dependence.Journal of Machine Learning Research, 23(216):1–58, 2022

Zhen Huang, Nabarun Deb, and Bodhisattva Sen. Kernel partial correlation coefficient—a measure of conditional dependence.Journal of Machine Learning Research, 23(216):1–58, 2022

work page 2022
[35]

On the power of chatterjee’s rank correlation

Hongjian Shi, Mathias Drton, and Fang Han. On the power of chatterjee’s rank correlation. Biometrika, 109(2):317–333, 2022

work page 2022
[36]

On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023

Zhexiao Lin and Fang Han. On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023

work page 2023
[37]

A simple bootstrap for chatterjee’s rank correlation

Holger Dette and Marius Kroll. A simple bootstrap for chatterjee’s rank correlation. Biometrika, 112(1):asae045, 2025

work page 2025
[38]

Reward- directed conditional diffusion: Provable distribution estimation and reward improvement

Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, and Mengdi Wang. Reward- directed conditional diffusion: Provable distribution estimation and reward improvement. In Advances in Neural Information Processing Systems 36, pages 60599–60635, 2023

work page 2023
[39]

Repaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 30

work page 2022
[40]

Vaca: Designing variational graph autoencoders for causal queries

Pablo S´ anchez-Mart´ ın, Miriam Rateike, and Isabel Valera. Vaca: Designing variational graph autoencoders for causal queries. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8159–8168, 2022

work page 2022
[41]

Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

work page 2005
[42]

Dimakis, and Sanjay Shakkottai

Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, and Sanjay Shakkottai. Model-powered conditional independence test. InAdvances in Neural Information Processing Systems 30, 2017

work page 2017
[43]

K-nearest-neighbor local sampling based conditional independence testing

Shuai Li, Yingjie Zhang, Hongtu Zhu, Christina Wang, Hai Shu, Ziqi Chen, Zhuoran Sun, and Yanfeng Yang. K-nearest-neighbor local sampling based conditional independence testing. InAdvances in Neural Information Processing Systems 36, pages 23321–23344, 2023

work page 2023
[44]

Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7 (1):20180017, 2019

Eric V Strobl, Kun Zhang, and Shyam Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7 (1):20180017, 2019

work page 2019
[45]

Kernel-based condi- tional independence test and application in causal discovery

Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based condi- tional independence test and application in causal discovery. InProceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813. AUAI Press, 2011

work page 2011
[46]

Conditional diffusion models based conditional independence testing

Yanfeng Yang, Shuai Li, Yingjie Zhang, Zhuoran Sun, Hai Shu, Ziqi Chen, and Renming Zhang. Conditional diffusion models based conditional independence testing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 22020–22028, 2025

work page 2025
[47]

Score-based generative modeling for conditional independence testing

Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, and Shuigeng Zhou. Score-based generative modeling for conditional independence testing. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2410–2419, 2025

work page 2025
[48]

Mooij and Tom Heskes

Joris M. Mooij and Tom Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439. AUAI Press, 2013

work page 2013
[49]

Sung Dae Kim, Ji Sue Baik, Jae-Hye Lee, Seo-Won Mun, Joo Mi Yi, and Moon-Taek Park. The malignancy of liver cancer cells is increased by il-4/erk/akt signaling axis activity triggered by irradiated endothelial cells.Journal of Radiation Research, 61(3):376–387, 2020

work page 2020
[50]

Syed J Khundmiri, Vishal Amin, Jeff Henson, John Lewis, Mohamed Ameen, Madhavi J Rane, and Nicholas A Delamere. Ouabain stimulates protein kinase b (akt) phosphorylation in opossum kidney proximal tubule cells through an erk-dependent pathway.American Journal of Physiology-Cell Physiology, 293(3):C1171–C1180, 2007. 31

work page 2007
[51]

Crosstalk between protein kinases akt and erk1/2 in human lung tumor-derived cell models.Frontiers in Oncology, 12:1045521, 2023

Aurimas Stulpinas, Matas Sereika, Aida Vitkeviciene, Ausra Imbrasaite, Natalija Krestnikova, and Audrone V Kalvelyte. Crosstalk between protein kinases akt and erk1/2 in human lung tumor-derived cell models.Frontiers in Oncology, 12:1045521, 2023. 32

work page 2023

[1] [1]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

work page 2015

[2] [2]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33, pages 6840–6851, 2020

work page 2020

[3] [3]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021

[4] [4]

Ergan Shang, Yuting Wei, and Kathryn Roeder. Predicting the unseen: a diffusion-based de- biasing framework for transcriptional response prediction at single-cell resolution.Proceedings of the National Academy of Sciences, 122(52):e2525268122, 2025

work page 2025

[5] [5]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

work page 1982

[6] [6]

A likelihood based approach to distribution regression using conditional deep generative models

Shivam Kumar, Yun Yang, and Lizhen Lin. A likelihood based approach to distribution regression using conditional deep generative models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 31964–31990, 2025

work page 2025

[7] [7]

Cambridge university press, 2009

Judea Pearl.Causality. Cambridge university press, 2009

work page 2009

[8] [8]

Dimakis, and Sriram Vishwanath

Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, 2018

work page 2018

[9] [9]

Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach

Amir-Hossein Karimi, Julius von K¨ ugelgen, Bernhard Sch¨ olkopf, and Isabel Valera. Algorith- mic recourse under imperfect causal knowledge: a probabilistic approach. InAdvances in Neural Information Processing Systems 33, pages 265–277, 2020

work page 2020

[10] [10]

Deep structural causal models for tractable counterfactual inference

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems 33, pages 857–869, 2020

work page 2020

[11] [11]

Modeling causal mechanisms with diffusion models for interventional and counterfactual queries.Transactions on Machine Learning Research, 2024

Patrick Chao, Patrick Bl¨ obaum, Sapan Kirit Patel, and Shiva Prasad Kasiviswanathan. Modeling causal mechanisms with diffusion models for interventional and counterfactual queries.Transactions on Machine Learning Research, 2024. 28

work page 2024

[12] [12]

Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.Mathematics of Operations Research, 2026

Zhihan Huang, Yuting Wei, and Yuxin Chen. Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.Mathematics of Operations Research, 2026

work page 2026

[13] [13]

O (d/t) convergence theory for diffusion probabilistic models under minimal assumptions.Journal of Machine Learning Research, 26(292):1–55, 2025

Gen Li and Yuling Yan. O (d/t) convergence theory for diffusion probabilistic models under minimal assumptions.Journal of Machine Learning Research, 26(292):1–55, 2025

work page 2025

[14] [14]

Linear convergence of diffusion models under the manifold hypothesis

Peter Potaptchik, Iskander Azangulov, and George Deligiannidis. Linear convergence of diffusion models under the manifold hypothesis. InProceedings of Thirty Eighth Conference on Learning Theory, volume 291 ofProceedings of Machine Learning Research, pages 4668–4685, 2025

work page 2025

[15] [15]

Score approximation, estima- tion and distribution recovery of diffusion models on low-dimensional data

Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estima- tion and distribution recovery of diffusion models on low-dimensional data. InInternational Conference on Machine Learning, pages 4672–4712. PMLR, 2023

work page 2023

[16] [16]

Adaptivity of diffusion models to manifold structures

Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International conference on artificial intelligence and statistics, pages 1648–1656. PMLR, 2024

work page 2024

[17] [17]

Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024

Xinyu Tian and Xiaotong Shen. Enhancing accuracy in generative models via knowledge transfer.arXiv preprint arXiv:2405.16837, 2024

work page arXiv 2024

[18] [18]

Inference in high-dimensional graphical models

Jana Jankov´ a and Sara van de Geer. Inference in high-dimensional graphical models. In Handbook of graphical models, pages 325–350. CRC Press, 2018

work page 2018

[19] [19]

Likelihood ratio tests for a large directed acyclic graph.Journal of the American Statistical Association, 115(531):1304–1319, 2020

Chunlin Li, Xiaotong Shen, and Wei Pan. Likelihood ratio tests for a large directed acyclic graph.Journal of the American Statistical Association, 115(531):1304–1319, 2020

work page 2020

[20] [20]

Discovery and inference of a causal network with hidden confounding.Journal of the American Statistical Association, 119(548): 2572–2584, 2024

Li Chen, Chunlin Li, Xiaotong Shen, and Wei Pan. Discovery and inference of a causal network with hidden confounding.Journal of the American Statistical Association, 119(548): 2572–2584, 2024

work page 2024

[21] [21]

Testing directed acyclic graph via structural, su- pervised and generative adversarial learning.Journal of the American Statistical Association, 119(547):1833–1846, 2024

Chengchun Shi, Yunzhe Zhou, and Lexin Li. Testing directed acyclic graph via structural, su- pervised and generative adversarial learning.Journal of the American Statistical Association, 119(547):1833–1846, 2024

work page 2024

[22] [22]

Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:‘model- x’knockoffs for high dimensional controlled variable selection.Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018

work page 2018

[23] [23]

The conditional permutation test for independence while controlling for confounders.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020

Thomas B Berrett, Yi Wang, Rina Foygel Barber, and Richard J Samworth. The conditional permutation test for independence while controlling for confounders.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020

work page 2020

[24] [24]

Wesley Tansey, Victor Veitch, Haoran Zhang, Raul Rabadan, and David M. Blei. The holdout randomization test for feature selection in black box models.Journal of Computational and Graphical Statistics, 31(1):151–162, 2022. 29

work page 2022

[25] [25]

causality-encoded diffusion models for interventional sampling and edge inference

Li Chen, Xiaotong Shen, and Wei Pan. Supplement to “causality-encoded diffusion models for interventional sampling and edge inference”, 2026

work page 2026

[26] [26]

Springer, 2006

Larry Wasserman.All of nonparametric statistics. Springer, 2006

work page 2006

[27] [27]

arXiv preprint arXiv:2403.11968 , year=

Hengyu Fu, Zhuoran Yang, Mengdi Wang, and Minshuo Chen. Unveil conditional dif- fusion models with classifier-free guidance: A sharp statistical theory.arXiv preprint arXiv:2403.11968, 2024

work page arXiv 2024

[28] [28]

A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

work page 2011

[29] [29]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012

work page 2012

[30] [30]

A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6(2):65–70, 1979

Sture Holm. A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6(2):65–70, 1979

work page 1979

[31] [31]

Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995

work page 1995

[32] [32]

A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021

Mona Azadkia and Sourav Chatterjee. A simple measure of conditional dependence.The Annals of Statistics, 49(6):3070–3102, 2021

work page 2021

[33] [33]

A simple extension of azadkia and chatterjee’s rank correlation to a vector of endogenous variables.arXiv preprint arXiv:2212.01621, 2022

Jonathan Ansari and Sebastian Fuchs. A simple extension of azadkia & chatterjee’s rank correlation to multi-response vectors.arXiv preprint arXiv:2212.01621, 2022

work page arXiv 2022

[34] [34]

Kernel partial correlation coefficient—a measure of conditional dependence.Journal of Machine Learning Research, 23(216):1–58, 2022

Zhen Huang, Nabarun Deb, and Bodhisattva Sen. Kernel partial correlation coefficient—a measure of conditional dependence.Journal of Machine Learning Research, 23(216):1–58, 2022

work page 2022

[35] [35]

On the power of chatterjee’s rank correlation

Hongjian Shi, Mathias Drton, and Fang Han. On the power of chatterjee’s rank correlation. Biometrika, 109(2):317–333, 2022

work page 2022

[36] [36]

On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023

Zhexiao Lin and Fang Han. On boosting the power of chatterjee’s rank correlation.Biometrika, 110(2):283–299, 2023

work page 2023

[37] [37]

A simple bootstrap for chatterjee’s rank correlation

Holger Dette and Marius Kroll. A simple bootstrap for chatterjee’s rank correlation. Biometrika, 112(1):asae045, 2025

work page 2025

[38] [38]

Reward- directed conditional diffusion: Provable distribution estimation and reward improvement

Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, and Mengdi Wang. Reward- directed conditional diffusion: Provable distribution estimation and reward improvement. In Advances in Neural Information Processing Systems 36, pages 60599–60635, 2023

work page 2023

[39] [39]

Repaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 30

work page 2022

[40] [40]

Vaca: Designing variational graph autoencoders for causal queries

Pablo S´ anchez-Mart´ ın, Miriam Rateike, and Isabel Valera. Vaca: Designing variational graph autoencoders for causal queries. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8159–8168, 2022

work page 2022

[41] [41]

Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

work page 2005

[42] [42]

Dimakis, and Sanjay Shakkottai

Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, and Sanjay Shakkottai. Model-powered conditional independence test. InAdvances in Neural Information Processing Systems 30, 2017

work page 2017

[43] [43]

K-nearest-neighbor local sampling based conditional independence testing

Shuai Li, Yingjie Zhang, Hongtu Zhu, Christina Wang, Hai Shu, Ziqi Chen, Zhuoran Sun, and Yanfeng Yang. K-nearest-neighbor local sampling based conditional independence testing. InAdvances in Neural Information Processing Systems 36, pages 23321–23344, 2023

work page 2023

[44] [44]

Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7 (1):20180017, 2019

Eric V Strobl, Kun Zhang, and Shyam Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7 (1):20180017, 2019

work page 2019

[45] [45]

Kernel-based condi- tional independence test and application in causal discovery

Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based condi- tional independence test and application in causal discovery. InProceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813. AUAI Press, 2011

work page 2011

[46] [46]

Conditional diffusion models based conditional independence testing

Yanfeng Yang, Shuai Li, Yingjie Zhang, Zhuoran Sun, Hai Shu, Ziqi Chen, and Renming Zhang. Conditional diffusion models based conditional independence testing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 22020–22028, 2025

work page 2025

[47] [47]

Score-based generative modeling for conditional independence testing

Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, and Shuigeng Zhou. Score-based generative modeling for conditional independence testing. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2410–2419, 2025

work page 2025

[48] [48]

Mooij and Tom Heskes

Joris M. Mooij and Tom Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439. AUAI Press, 2013

work page 2013

[49] [49]

Sung Dae Kim, Ji Sue Baik, Jae-Hye Lee, Seo-Won Mun, Joo Mi Yi, and Moon-Taek Park. The malignancy of liver cancer cells is increased by il-4/erk/akt signaling axis activity triggered by irradiated endothelial cells.Journal of Radiation Research, 61(3):376–387, 2020

work page 2020

[50] [50]

Syed J Khundmiri, Vishal Amin, Jeff Henson, John Lewis, Mohamed Ameen, Madhavi J Rane, and Nicholas A Delamere. Ouabain stimulates protein kinase b (akt) phosphorylation in opossum kidney proximal tubule cells through an erk-dependent pathway.American Journal of Physiology-Cell Physiology, 293(3):C1171–C1180, 2007. 31

work page 2007

[51] [51]

Crosstalk between protein kinases akt and erk1/2 in human lung tumor-derived cell models.Frontiers in Oncology, 12:1045521, 2023

Aurimas Stulpinas, Matas Sereika, Aida Vitkeviciene, Ausra Imbrasaite, Natalija Krestnikova, and Audrone V Kalvelyte. Crosstalk between protein kinases akt and erk1/2 in human lung tumor-derived cell models.Frontiers in Oncology, 12:1045521, 2023. 32

work page 2023