A Neuroimaging Simulation Framework for Developing and Evaluating Causal AI

Emma A.M. Stanley; Erik Y. Ohara; Eryn Libert-Scott; Matthias Wilms; Nils D. Forkert; Vibujithan Vigneshwaran

arxiv: 2606.28684 · v1 · pith:KTX6NMPGnew · submitted 2026-06-27 · 📡 eess.IV · cs.LG

A Neuroimaging Simulation Framework for Developing and Evaluating Causal AI

Eryn Libert-Scott , Emma A.M. Stanley , Vibujithan Vigneshwaran , Matthias Wilms , Erik Y. Ohara , Nils D. Forkert This is my paper

Pith reviewed 2026-06-30 09:06 UTC · model grok-4.3

classification 📡 eess.IV cs.LG

keywords neuroimaging simulationcausal AIsynthetic MRIground-truth datacausal discoveryT1-weighted imagesvolumetric changesbrain region control

0 comments

The pith

A simulation framework produces realistic 3D brain scans with precisely controlled causal effects to supply ground-truth data for causal AI methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that generates synthetic T1-weighted magnetic resonance images while enforcing a user-specified causal structure between non-image variables and image features. Anatomical differences across subjects are created by sampling a subspace from real scans and warping a template, and causal relationships are imposed through targeted volume adjustments in chosen brain regions. These adjustments avoid global side effects, yielding low error rates in both targeted and non-targeted areas. The resulting datasets allow objective testing of causal discovery algorithms, which the authors show still produce many spurious links when applied to image data. This addresses the absence of known causal ground truth that has slowed progress on causal AI for neuroimaging.

Core claim

The framework generates realistic synthetic 3D neuroimages that adhere to a user-specified causal structure by encoding relationships through precise volumetric changes of any region-of-interest without unwanted global artifacts, while anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image, thereby creating the first source of ground-truth datasets for benchmarking and developing causal AI methods in neuroimaging.

What carries the argument

Encoding causal relationships via precise volumetric changes of any region-of-interest without unwanted global artifacts, combined with subspace sampling from real data for subject variability.

If this is right

Enables creation of unlimited ground-truth datasets with known causal structures for objective benchmarking of causal AI.
Demonstrates that current causal discovery methods applied to these images produce many spurious connections.
Supports development of new causal methods adapted to the statistical properties of medical images.
Achieves relative volume errors of 0.3-2.66 percent in targeted regions while keeping mean absolute errors in non-target regions between 0.034-0.397 ml.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same volumetric-control approach could be adapted to test causal models of disease progression by varying the strength of the imposed effects.
Synthetic datasets from this framework could serve as a common benchmark for comparing different causal AI architectures on identical causal ground truth.
Integration with existing image-registration tools might allow the framework to incorporate real patient covariates as additional causal nodes.

Load-bearing premise

Precise volumetric changes to chosen brain regions without creating global artifacts produce images realistic enough to stand in for real causal structures in neuroimaging.

What would settle it

Apply existing causal discovery algorithms to the generated images and check whether they recover the known causal edges at rates substantially above those expected from random guessing while keeping false-positive rates low.

Figures

Figures reproduced from arXiv: 2606.28684 by Emma A.M. Stanley, Erik Y. Ohara, Eryn Libert-Scott, Matthias Wilms, Nils D. Forkert, Vibujithan Vigneshwaran.

**Figure 2.** Figure 2: Original, intervened, and difference map images for left lateral [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Volume change vs. causal variable across various target VOIs for the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Adjacency matrices for the three ground-truth structures, chain(left), [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of causal graphs of the ground-truth (chain) and causal [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Causally linking disease-related factors to image-derived biomarkers provides a powerful pathway to understanding disease mechanisms. Despite growing interest in applying causal artificial intelligence (AI) approaches for this task, these methods still need to be adapted for complex medical images, and especially, neuroimaging. However, the lack of ground-truth data presents a barrier to development. To bridge this gap, we developed and tested a method for generating synthetic neuroimages, which adhere to a user-specified causal structure describing the non-image to image variable relationships, permitting the creation of ground-truth neuroimaging datasets. In the simulated T1-weighted magnetic resonance images, anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image to create unique simulated subjects. Causal relationships are encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts. We achieved relative volume errors of 0.3-2.66% for the targeted regions-of-interest and demonstrate their statistically significant causal relationships, while maintaining mean absolute errors for non-target brain regions between 0.034-0.397ml. An initial evaluation of causal discovery methods exposes their limited ability to suppress spurious connections, highlighting the need for image-appropriate methods. Our framework is the first to enable the generation of realistic synthetic 3D neuroimages with explicit causal control that can serve as the missing ground-truth data necessary for the objective benchmarking and development of causal AI methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simulation framework controls 3D neuroimage causality via ROI volumes with solid error metrics, but realism checks stay narrow.

read the letter

The main thing to know is that this paper gives a concrete method for making synthetic 3D T1 MRIs where users set causal relationships by changing volumes of chosen regions, sampling anatomical variation from a real-data subspace and deforming a template. It positions itself as the first tool of this kind for ground-truth causal AI benchmarking.

The work does well on the generation side. It achieves relative volume errors of 0.3-2.66% on targets and keeps non-target MAE low at 0.034-0.397 ml, with no global artifacts reported. The initial run of causal discovery methods on the outputs is a practical touch that shows current tools still pick up spurious links.

The soft spot sits in the realism claim. Causality is encoded only through precise volumetric deformations, and the numbers back that control. But the abstract gives no evidence that intensity patterns, textures, or other image features end up distributed the way they would under the same causal factors in real scans. The stress-test concern lands here: methods that rely on non-volume biomarkers could be tested on an incomplete structure. That does not sink the paper, but it limits how far the ground-truth claim can stretch without more checks.

This is for people building or testing causal AI on neuroimaging data who need controlled synthetic sets. The approach is straightforward, the metrics are clear, and there is no circularity in how the data is generated. It deserves a serious referee so the full methods, code, and any expanded validation can be examined.

Referee Report

1 major / 0 minor

Summary. The paper introduces a simulation framework for generating synthetic 3D T1-weighted MRI neuroimages that follow user-specified causal structures. Anatomical variability is modeled by sampling from a real-data subspace and deforming a template; causality is encoded exclusively through precise volumetric deformations of user-specified ROIs. The authors report relative volume errors of 0.3-2.66% on target ROIs, non-target MAE of 0.034-0.397 ml, statistically significant causal relationships, and an initial demonstration that existing causal discovery methods produce spurious connections on the generated data. They position the framework as the first to supply realistic ground-truth 3D neuroimages with explicit causal control for benchmarking causal AI methods.

Significance. If the generated images prove representative of real causal neuroimaging structures beyond volume metrics, the framework would address a genuine bottleneck in causal AI development for medical imaging by supplying controllable ground-truth data. The reported quantitative volume-preservation metrics constitute a concrete, falsifiable strength. However, the central utility claim hinges on an untested assumption that volumetric ROI changes alone produce intensity patterns, textures, and higher-order features whose joint distributions match those arising from the same causal factors in real data.

major comments (1)

[Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.

Authors: The framework models anatomical variability by sampling from a real-data subspace and deforming a template, which is intended to reproduce intensity patterns and textures consistent with real distributions. The reported volume metrics demonstrate precise causal control without global artifacts. We agree that the absence of explicit metrics on intensity histograms, texture features, or higher-order statistics leaves the broader realism claim less fully supported than it could be. We will revise the abstract and add such analyses to the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: generative simulation is self-contained construction

full rationale

The paper presents a forward generative procedure: sample anatomical variability from a real-data subspace, deform a template, and encode user-specified causal relationships exclusively via targeted volumetric ROI changes. All reported quantities (relative volume errors 0.3-2.66%, non-target MAE 0.034-0.397 ml, statistically significant volume correlations) are direct measurements of the controlled deformations themselves rather than predictions or inferences that reduce to fitted parameters presupposing the target result. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing justification; the method is offered as an explicit construction for producing ground-truth data. The derivation chain therefore contains no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about modeling anatomical variability and encoding causality through localized volume changes; no explicit free parameters fitted to the target causal results are described in the abstract.

free parameters (1)

subspace estimated from real data
Used to sample anatomical variability for creating unique simulated subjects.

axioms (2)

domain assumption Anatomical variability can be modeled by sampling from a subspace estimated from real data and deforming a template image
Invoked to create unique simulated subjects while maintaining realism.
domain assumption Causal relationships can be encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts
Central to permitting creation of ground-truth datasets with specified causal structure.

pith-pipeline@v0.9.1-grok · 5807 in / 1305 out tokens · 30664 ms · 2026-06-30T09:06:35.273011+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?

P. Lowenstein and M. Castro, “Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?”Current Gene Therapy, vol. 9, no. 5, pp. 368–374, Oct. 2009. [Online]. Available: http://www.eurekaselect.com/openurl/c ontent.php?genre=article&issn=1566-5232&volume=9&issue=5&spage =368

2009
[2]

Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,

G. A. McLeod, E. A. M. Stanley, T. Rosenal, and N. D. Forkert, “Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,”npj Digital Medicine, vol. 9, no. 1, p. 62, Dec. 2025. [Online]. Available: https://www.nature.com/articles/s41746 -025-02226-5

2025
[3]

High-performance medicine: the convergence of human and artificial intelligence,

E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,”Nature Medicine, vol. 25, no. 1, pp. 44–56, Jan. 2019. [Online]. Available: https://www.nature.com/articles/s41591 -018-0300-7

2019
[4]

Artificial intelligence in medicine: current trends and future possibilities,

V . H. Buch, I. Ahmed, and M. Maruthappu, “Artificial intelligence in medicine: current trends and future possibilities,”British Journal of General Practice, vol. 68, no. 668, pp. 143–144, Mar. 2018. [Online]. Available: https://bjgp.org/lookup/doi/10.3399/bjgp18X695213

work page doi:10.3399/bjgp18x695213 2018
[5]

Causal Machine Learning for Healthcare and Precision Medicine,

P. Sanchez, J. P. V oisey, T. Xia, H. I. Watson, A. Q. ONeil, and S. A. Tsaftaris, “Causal Machine Learning for Healthcare and Precision Medicine,” 2022, version Number: 2. [Online]. Available: https://arxiv.org/abs/2205.11402

work page arXiv 2022
[6]

Pearl,Causality

J. Pearl,Causality. Cambridge University Press, 2009

2009
[7]

Causality matters in medical imaging,

D. C. Castro, I. Walker, and B. Glocker, “Causality matters in medical imaging,”Nature Communications, vol. 11, no. 1, p. 3673, Jul. 2020. [Online]. Available: https://www.nature.com/articles/s41467-020-17478 -w

2020
[8]

From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,

A. Komanduri, X. Wu, Y . Wu, and F. Chen, “From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,” 2024. [Online]. Available: https://arxiv.org/abs/2310.11011

work page arXiv 2024
[9]

Causal Machine Learning: A Survey and Open Problems

J. Kaddour, A. Lynch, Q. Liu, M. J. Kusner, and R. Silva, “Causal Machine Learning: A Survey and Open Problems,” 2022, version Number: 3. [Online]. Available: https://arxiv.org/abs/2206.15475

work page internal anchor Pith review Pith/arXiv arXiv 2022
[10]

Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,

“Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,” 2024. [Online]. Available: https://www.who.int/news/item/14-03-2024-over-1-in-3-people-affect ed-by-neurological-conditions--the-leading-cause-of-illness-and-disab ility-worldwide

2024
[11]

DAGs with NO TEARS: Continuous Optimization for Structure Learning

X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with no tears: Continuous optimization for structure learning,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01422

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

DAG-GNN: DAG Structure Learning with Graph Neural Networks

Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG Structure Learning with Graph Neural Networks,” 2019, version Number: 1. [Online]. Available: https://arxiv.org/abs/1904.10098

work page internal anchor Pith review Pith/arXiv arXiv 2019
[13]

Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,

K. Bello, B. Aragam, and P. Ravikumar, “Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,” 2023. [Online]. Available: https://arxiv.org/abs/2209.08037

work page arXiv 2023
[14]

A Survey on Causal Discovery: Theory and Practice,

A. Zanga, E. Ozkirimli, and F. Stella, “A Survey on Causal Discovery: Theory and Practice,”International Journal of Approximate Reasoning, vol. 151, pp. 101–129, Dec. 2022. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0888613X22001402

2022
[15]

Database with cause- effect pairs,

Max Planck Institute for Intelligent Systems, “Database with cause- effect pairs,” https://webdav.tuebingen.mpg.de/cause-effect/, 2026, accessed: 2026-01-07

2026
[16]

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,

O. Ahmed, F. Tr ¨auble, A. Goyal, A. Neitz, Y . Bengio, B. Sch ¨olkopf, M. W ¨uthrich, and S. Bauer, “CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,” 2020, version Number: 2. [Online]. Available: https://arxiv.org/abs/2010.04296

work page arXiv 2020
[17]

Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,”Science, vol. 308, no. 5721, pp. 523–529, Apr. 2005. [Online]. Available: https://www.science.org/doi/10.1126/science.1105 809

work page doi:10.1126/science.1105 2005
[18]

Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,

A. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V . Li, L. Peshkin, D. Weitz, and M. Kirschner, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,”Cell, vol. 161, no. 5, pp. 1187–1201, May 2015. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0092867415005000

2015
[19]

Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,

A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic, D. Dionne, T. Burks, R. Raychowdhury, B. Adamson, T. M. Norman, E. S. Lander, J. S. Weissman, N. Friedman, and A. Regev, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,”Cell, vol. 167, no. 7, pp. 1853–1866.e17,...

2016
[20]

Synthetic data generation methods in healthcare: A review on open-source tools and methods,

V . C. Pezoulas, D. I. Zaridis, E. Mylona, C. Androutsos, K. Apostolidis, N. S. Tachos, and D. I. Fotiadis, “Synthetic data generation methods in healthcare: A review on open-source tools and methods,”Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec
[21]

Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

[Online]. Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

work page doi:10.1016/j.csbj.202
[22]

Causalvae: Structured causal disentanglement in variational autoencoder,

M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Structured causal disentanglement in variational autoencoder,” 2023. [Online]. Available: https://arxiv.org/abs/2004.08697

work page arXiv 2023
[23]

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath, “CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training,” 2017, version Number: 2. [Online]. Available: https://arxiv.org/abs/1709.02023

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

MACAW: A Causal Generative Model for Medical Imaging,

V . Vigneshwaran, E. Ohara, M. Wilms, and N. Forkert, “MACAW: A Causal Generative Model for Medical Imaging,” 2024, version Number:

2024
[25]

Available: https://arxiv.org/abs/2412.02900

[Online]. Available: https://arxiv.org/abs/2412.02900

work page arXiv
[26]

A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,

E. A. M. Stanley, M. Wilms, and N. D. Forkert, “A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,” inMedical Image Computing and Computer 10 Assisted Intervention – MICCAI 2023, H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, and R. Taylor, Eds. Cham: Springer Natur...

work page doi:10.1007/978-3-031-43895-0 2023
[27]

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,

E. A. M. Stanley, R. Souza, A. J. Winder, V . Gulve, K. Amador, M. Wilms, and N. D. Forkert, “Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,” Journal of the American Medical Informatics Association, vol. 31, no. 11, pp. 2613–2621, Nov. 2024. [Online]. Available: https: //academic.oup.com/jamia/article/3...

2024
[28]

Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,

E. A. M. Stanley, V . Vigneshwaran, E. Y . Ohara, F. G. Vamosi, N. D. Forkert, and M. Wilms, “Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, J. C. Gee, D. C. Alexander, J. Hong, J. E. Iglesias, C. H. Sudre, A. V...

work page doi:10.1007/978-3-032-04984-1 2025
[29]

The SRI24 multichannel atlas of normal adult human brain structure,

T. Rohlfing, N. M. Zahr, E. V . Sullivan, and A. Pfefferbaum, “The SRI24 multichannel atlas of normal adult human brain structure,”Human Brain Mapping, vol. 31, no. 5, pp. 798–819, May 2010. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/hbm.20906

work page doi:10.1002/hbm.20906 2010
[30]

Ixi dataset – brain development,

Biomedical Image Analysis Group, “Ixi dataset – brain development,” https://brain-development.org/ixi-dataset/, 2023, accessed: 2023-02-27

2023
[31]

SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,

B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A. V . Dalca, and J. E. Iglesias, “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, p. 102789, May 2023. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S136184152300 0506

2023
[32]

Back matter,

J. Nocedal and S. J. Wright, “Back matter,” inNumerical Optimization. New York, NY: Springer, 2006, pp. 598–664

2006
[33]

Spirtes, C

P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2000

2000
[34]

Optimal structure identification with greedy search,

C. Maxwell, “Optimal structure identification with greedy search,” Journal of Machine Learning Research, Mar. 2003

2003
[35]

A linear non-gaussian acyclic model for causal discovery,

S. Shimizu, P. O. Hoyer, A. Hyv ¨arinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discovery,”Journal of Machine Learning Research, Dec. 2006

2006
[36]

Review of Causal Discovery Methods Based on Graphical Models,

C. Glymour, K. Zhang, and P. Spirtes, “Review of Causal Discovery Methods Based on Graphical Models,”Frontiers in Genetics, vol. 10, p. 524, Jun. 2019. [Online]. Available: https: //www.frontiersin.org/article/10.3389/fgene.2019.00524/full

work page doi:10.3389/fgene.2019.00524/full 2019
[37]

Learning Bayesian Networks is NP-Complete,

D. M. Chickering, “Learning Bayesian Networks is NP-Complete,” in Learning from Data, P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger, D. Fisher, and H.-J. Lenz, Eds. New York, NY: Springer New York, 1996, vol. 112, pp. 121– 130, series Title: Lecture Notes in Statistics. [Online]. Available: http://link.springer.com/10.10...

work page doi:10.1007/978-1-4612-2404-4 1996
[38]

Large-sample learning of bayesian networks is np-hard,

D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of bayesian networks is np-hard,”Journal of Machine Learning Re- search, vol. 5, Jan. 2004

2004
[39]

gcastle: A python toolbox for causal discovery.arXiv preprint arXiv:2111.15155, 2021

K. Zhang, S. Zhu, M. Kalander, I. Ng, J. Ye, Z. Chen, and L. Pan, “gCastle: A Python Toolbox for Causal Discovery,” 2021, version Number: 1. [Online]. Available: https://arxiv.org/abs/2111.15155

work page arXiv 2021
[40]

pgmpy: A Python Toolkit for Bayesian Networks,

A. Ankan and J. Textor, “pgmpy: A Python Toolkit for Bayesian Networks,” 2023, version Number: 1. [Online]. Available: https: //arxiv.org/abs/2304.08639

work page arXiv 2023
[41]

Large-scale unconstrained optimization,

J. Nocedal and S. J. Wright, “Large-scale unconstrained optimization,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 164–192

2006
[42]

Quasi-newton methods,

——, “Quasi-newton methods,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 135–163

2006
[43]

Evaluation of 3D Counterfactual Brain MRI Generation,

P. Sun, W. Peng, L. Y . Li, Y . Wang, and K. M. Pohl, “Evaluation of 3D Counterfactual Brain MRI Generation,” 2025, version Number: 2. [Online]. Available: https://arxiv.org/abs/2508.02880

work page arXiv 2025

[1] [1]

Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?

P. Lowenstein and M. Castro, “Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?”Current Gene Therapy, vol. 9, no. 5, pp. 368–374, Oct. 2009. [Online]. Available: http://www.eurekaselect.com/openurl/c ontent.php?genre=article&issn=1566-5232&volume=9&issue=5&spage =368

2009

[2] [2]

Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,

G. A. McLeod, E. A. M. Stanley, T. Rosenal, and N. D. Forkert, “Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,”npj Digital Medicine, vol. 9, no. 1, p. 62, Dec. 2025. [Online]. Available: https://www.nature.com/articles/s41746 -025-02226-5

2025

[3] [3]

High-performance medicine: the convergence of human and artificial intelligence,

E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,”Nature Medicine, vol. 25, no. 1, pp. 44–56, Jan. 2019. [Online]. Available: https://www.nature.com/articles/s41591 -018-0300-7

2019

[4] [4]

Artificial intelligence in medicine: current trends and future possibilities,

V . H. Buch, I. Ahmed, and M. Maruthappu, “Artificial intelligence in medicine: current trends and future possibilities,”British Journal of General Practice, vol. 68, no. 668, pp. 143–144, Mar. 2018. [Online]. Available: https://bjgp.org/lookup/doi/10.3399/bjgp18X695213

work page doi:10.3399/bjgp18x695213 2018

[5] [5]

Causal Machine Learning for Healthcare and Precision Medicine,

P. Sanchez, J. P. V oisey, T. Xia, H. I. Watson, A. Q. ONeil, and S. A. Tsaftaris, “Causal Machine Learning for Healthcare and Precision Medicine,” 2022, version Number: 2. [Online]. Available: https://arxiv.org/abs/2205.11402

work page arXiv 2022

[6] [6]

Pearl,Causality

J. Pearl,Causality. Cambridge University Press, 2009

2009

[7] [7]

Causality matters in medical imaging,

D. C. Castro, I. Walker, and B. Glocker, “Causality matters in medical imaging,”Nature Communications, vol. 11, no. 1, p. 3673, Jul. 2020. [Online]. Available: https://www.nature.com/articles/s41467-020-17478 -w

2020

[8] [8]

From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,

A. Komanduri, X. Wu, Y . Wu, and F. Chen, “From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,” 2024. [Online]. Available: https://arxiv.org/abs/2310.11011

work page arXiv 2024

[9] [9]

Causal Machine Learning: A Survey and Open Problems

J. Kaddour, A. Lynch, Q. Liu, M. J. Kusner, and R. Silva, “Causal Machine Learning: A Survey and Open Problems,” 2022, version Number: 3. [Online]. Available: https://arxiv.org/abs/2206.15475

work page internal anchor Pith review Pith/arXiv arXiv 2022

[10] [10]

Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,

“Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,” 2024. [Online]. Available: https://www.who.int/news/item/14-03-2024-over-1-in-3-people-affect ed-by-neurological-conditions--the-leading-cause-of-illness-and-disab ility-worldwide

2024

[11] [11]

DAGs with NO TEARS: Continuous Optimization for Structure Learning

X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with no tears: Continuous optimization for structure learning,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01422

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

DAG-GNN: DAG Structure Learning with Graph Neural Networks

Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG Structure Learning with Graph Neural Networks,” 2019, version Number: 1. [Online]. Available: https://arxiv.org/abs/1904.10098

work page internal anchor Pith review Pith/arXiv arXiv 2019

[13] [13]

Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,

K. Bello, B. Aragam, and P. Ravikumar, “Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,” 2023. [Online]. Available: https://arxiv.org/abs/2209.08037

work page arXiv 2023

[14] [14]

A Survey on Causal Discovery: Theory and Practice,

A. Zanga, E. Ozkirimli, and F. Stella, “A Survey on Causal Discovery: Theory and Practice,”International Journal of Approximate Reasoning, vol. 151, pp. 101–129, Dec. 2022. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0888613X22001402

2022

[15] [15]

Database with cause- effect pairs,

Max Planck Institute for Intelligent Systems, “Database with cause- effect pairs,” https://webdav.tuebingen.mpg.de/cause-effect/, 2026, accessed: 2026-01-07

2026

[16] [16]

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,

O. Ahmed, F. Tr ¨auble, A. Goyal, A. Neitz, Y . Bengio, B. Sch ¨olkopf, M. W ¨uthrich, and S. Bauer, “CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,” 2020, version Number: 2. [Online]. Available: https://arxiv.org/abs/2010.04296

work page arXiv 2020

[17] [17]

Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,”Science, vol. 308, no. 5721, pp. 523–529, Apr. 2005. [Online]. Available: https://www.science.org/doi/10.1126/science.1105 809

work page doi:10.1126/science.1105 2005

[18] [18]

Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,

A. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V . Li, L. Peshkin, D. Weitz, and M. Kirschner, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,”Cell, vol. 161, no. 5, pp. 1187–1201, May 2015. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0092867415005000

2015

[19] [19]

Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,

A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic, D. Dionne, T. Burks, R. Raychowdhury, B. Adamson, T. M. Norman, E. S. Lander, J. S. Weissman, N. Friedman, and A. Regev, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,”Cell, vol. 167, no. 7, pp. 1853–1866.e17,...

2016

[20] [20]

Synthetic data generation methods in healthcare: A review on open-source tools and methods,

V . C. Pezoulas, D. I. Zaridis, E. Mylona, C. Androutsos, K. Apostolidis, N. S. Tachos, and D. I. Fotiadis, “Synthetic data generation methods in healthcare: A review on open-source tools and methods,”Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec

[21] [21]

Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

[Online]. Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

work page doi:10.1016/j.csbj.202

[22] [22]

Causalvae: Structured causal disentanglement in variational autoencoder,

M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Structured causal disentanglement in variational autoencoder,” 2023. [Online]. Available: https://arxiv.org/abs/2004.08697

work page arXiv 2023

[23] [23]

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath, “CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training,” 2017, version Number: 2. [Online]. Available: https://arxiv.org/abs/1709.02023

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

MACAW: A Causal Generative Model for Medical Imaging,

V . Vigneshwaran, E. Ohara, M. Wilms, and N. Forkert, “MACAW: A Causal Generative Model for Medical Imaging,” 2024, version Number:

2024

[25] [25]

Available: https://arxiv.org/abs/2412.02900

[Online]. Available: https://arxiv.org/abs/2412.02900

work page arXiv

[26] [26]

A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,

E. A. M. Stanley, M. Wilms, and N. D. Forkert, “A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,” inMedical Image Computing and Computer 10 Assisted Intervention – MICCAI 2023, H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, and R. Taylor, Eds. Cham: Springer Natur...

work page doi:10.1007/978-3-031-43895-0 2023

[27] [27]

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,

E. A. M. Stanley, R. Souza, A. J. Winder, V . Gulve, K. Amador, M. Wilms, and N. D. Forkert, “Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,” Journal of the American Medical Informatics Association, vol. 31, no. 11, pp. 2613–2621, Nov. 2024. [Online]. Available: https: //academic.oup.com/jamia/article/3...

2024

[28] [28]

Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,

E. A. M. Stanley, V . Vigneshwaran, E. Y . Ohara, F. G. Vamosi, N. D. Forkert, and M. Wilms, “Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, J. C. Gee, D. C. Alexander, J. Hong, J. E. Iglesias, C. H. Sudre, A. V...

work page doi:10.1007/978-3-032-04984-1 2025

[29] [29]

The SRI24 multichannel atlas of normal adult human brain structure,

T. Rohlfing, N. M. Zahr, E. V . Sullivan, and A. Pfefferbaum, “The SRI24 multichannel atlas of normal adult human brain structure,”Human Brain Mapping, vol. 31, no. 5, pp. 798–819, May 2010. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/hbm.20906

work page doi:10.1002/hbm.20906 2010

[30] [30]

Ixi dataset – brain development,

Biomedical Image Analysis Group, “Ixi dataset – brain development,” https://brain-development.org/ixi-dataset/, 2023, accessed: 2023-02-27

2023

[31] [31]

SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,

B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A. V . Dalca, and J. E. Iglesias, “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, p. 102789, May 2023. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S136184152300 0506

2023

[32] [32]

Back matter,

J. Nocedal and S. J. Wright, “Back matter,” inNumerical Optimization. New York, NY: Springer, 2006, pp. 598–664

2006

[33] [33]

Spirtes, C

P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2000

2000

[34] [34]

Optimal structure identification with greedy search,

C. Maxwell, “Optimal structure identification with greedy search,” Journal of Machine Learning Research, Mar. 2003

2003

[35] [35]

A linear non-gaussian acyclic model for causal discovery,

S. Shimizu, P. O. Hoyer, A. Hyv ¨arinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discovery,”Journal of Machine Learning Research, Dec. 2006

2006

[36] [36]

Review of Causal Discovery Methods Based on Graphical Models,

C. Glymour, K. Zhang, and P. Spirtes, “Review of Causal Discovery Methods Based on Graphical Models,”Frontiers in Genetics, vol. 10, p. 524, Jun. 2019. [Online]. Available: https: //www.frontiersin.org/article/10.3389/fgene.2019.00524/full

work page doi:10.3389/fgene.2019.00524/full 2019

[37] [37]

Learning Bayesian Networks is NP-Complete,

D. M. Chickering, “Learning Bayesian Networks is NP-Complete,” in Learning from Data, P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger, D. Fisher, and H.-J. Lenz, Eds. New York, NY: Springer New York, 1996, vol. 112, pp. 121– 130, series Title: Lecture Notes in Statistics. [Online]. Available: http://link.springer.com/10.10...

work page doi:10.1007/978-1-4612-2404-4 1996

[38] [38]

Large-sample learning of bayesian networks is np-hard,

D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of bayesian networks is np-hard,”Journal of Machine Learning Re- search, vol. 5, Jan. 2004

2004

[39] [39]

gcastle: A python toolbox for causal discovery.arXiv preprint arXiv:2111.15155, 2021

K. Zhang, S. Zhu, M. Kalander, I. Ng, J. Ye, Z. Chen, and L. Pan, “gCastle: A Python Toolbox for Causal Discovery,” 2021, version Number: 1. [Online]. Available: https://arxiv.org/abs/2111.15155

work page arXiv 2021

[40] [40]

pgmpy: A Python Toolkit for Bayesian Networks,

A. Ankan and J. Textor, “pgmpy: A Python Toolkit for Bayesian Networks,” 2023, version Number: 1. [Online]. Available: https: //arxiv.org/abs/2304.08639

work page arXiv 2023

[41] [41]

Large-scale unconstrained optimization,

J. Nocedal and S. J. Wright, “Large-scale unconstrained optimization,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 164–192

2006

[42] [42]

Quasi-newton methods,

——, “Quasi-newton methods,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 135–163

2006

[43] [43]

Evaluation of 3D Counterfactual Brain MRI Generation,

P. Sun, W. Peng, L. Y . Li, Y . Wang, and K. M. Pohl, “Evaluation of 3D Counterfactual Brain MRI Generation,” 2025, version Number: 2. [Online]. Available: https://arxiv.org/abs/2508.02880

work page arXiv 2025