A Neuroimaging Simulation Framework for Developing and Evaluating Causal AI
Pith reviewed 2026-06-30 09:06 UTC · model grok-4.3
The pith
A simulation framework produces realistic 3D brain scans with precisely controlled causal effects to supply ground-truth data for causal AI methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework generates realistic synthetic 3D neuroimages that adhere to a user-specified causal structure by encoding relationships through precise volumetric changes of any region-of-interest without unwanted global artifacts, while anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image, thereby creating the first source of ground-truth datasets for benchmarking and developing causal AI methods in neuroimaging.
What carries the argument
Encoding causal relationships via precise volumetric changes of any region-of-interest without unwanted global artifacts, combined with subspace sampling from real data for subject variability.
If this is right
- Enables creation of unlimited ground-truth datasets with known causal structures for objective benchmarking of causal AI.
- Demonstrates that current causal discovery methods applied to these images produce many spurious connections.
- Supports development of new causal methods adapted to the statistical properties of medical images.
- Achieves relative volume errors of 0.3-2.66 percent in targeted regions while keeping mean absolute errors in non-target regions between 0.034-0.397 ml.
Where Pith is reading between the lines
- The same volumetric-control approach could be adapted to test causal models of disease progression by varying the strength of the imposed effects.
- Synthetic datasets from this framework could serve as a common benchmark for comparing different causal AI architectures on identical causal ground truth.
- Integration with existing image-registration tools might allow the framework to incorporate real patient covariates as additional causal nodes.
Load-bearing premise
Precise volumetric changes to chosen brain regions without creating global artifacts produce images realistic enough to stand in for real causal structures in neuroimaging.
What would settle it
Apply existing causal discovery algorithms to the generated images and check whether they recover the known causal edges at rates substantially above those expected from random guessing while keeping false-positive rates low.
Figures
read the original abstract
Causally linking disease-related factors to image-derived biomarkers provides a powerful pathway to understanding disease mechanisms. Despite growing interest in applying causal artificial intelligence (AI) approaches for this task, these methods still need to be adapted for complex medical images, and especially, neuroimaging. However, the lack of ground-truth data presents a barrier to development. To bridge this gap, we developed and tested a method for generating synthetic neuroimages, which adhere to a user-specified causal structure describing the non-image to image variable relationships, permitting the creation of ground-truth neuroimaging datasets. In the simulated T1-weighted magnetic resonance images, anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image to create unique simulated subjects. Causal relationships are encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts. We achieved relative volume errors of 0.3-2.66% for the targeted regions-of-interest and demonstrate their statistically significant causal relationships, while maintaining mean absolute errors for non-target brain regions between 0.034-0.397ml. An initial evaluation of causal discovery methods exposes their limited ability to suppress spurious connections, highlighting the need for image-appropriate methods. Our framework is the first to enable the generation of realistic synthetic 3D neuroimages with explicit causal control that can serve as the missing ground-truth data necessary for the objective benchmarking and development of causal AI methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a simulation framework for generating synthetic 3D T1-weighted MRI neuroimages that follow user-specified causal structures. Anatomical variability is modeled by sampling from a real-data subspace and deforming a template; causality is encoded exclusively through precise volumetric deformations of user-specified ROIs. The authors report relative volume errors of 0.3-2.66% on target ROIs, non-target MAE of 0.034-0.397 ml, statistically significant causal relationships, and an initial demonstration that existing causal discovery methods produce spurious connections on the generated data. They position the framework as the first to supply realistic ground-truth 3D neuroimages with explicit causal control for benchmarking causal AI methods.
Significance. If the generated images prove representative of real causal neuroimaging structures beyond volume metrics, the framework would address a genuine bottleneck in causal AI development for medical imaging by supplying controllable ground-truth data. The reported quantitative volume-preservation metrics constitute a concrete, falsifiable strength. However, the central utility claim hinges on an untested assumption that volumetric ROI changes alone produce intensity patterns, textures, and higher-order features whose joint distributions match those arising from the same causal factors in real data.
major comments (1)
- [Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.
Authors: The framework models anatomical variability by sampling from a real-data subspace and deforming a template, which is intended to reproduce intensity patterns and textures consistent with real distributions. The reported volume metrics demonstrate precise causal control without global artifacts. We agree that the absence of explicit metrics on intensity histograms, texture features, or higher-order statistics leaves the broader realism claim less fully supported than it could be. We will revise the abstract and add such analyses to the manuscript. revision: yes
Circularity Check
No circularity: generative simulation is self-contained construction
full rationale
The paper presents a forward generative procedure: sample anatomical variability from a real-data subspace, deform a template, and encode user-specified causal relationships exclusively via targeted volumetric ROI changes. All reported quantities (relative volume errors 0.3-2.66%, non-target MAE 0.034-0.397 ml, statistically significant volume correlations) are direct measurements of the controlled deformations themselves rather than predictions or inferences that reduce to fitted parameters presupposing the target result. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing justification; the method is offered as an explicit construction for producing ground-truth data. The derivation chain therefore contains no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- subspace estimated from real data
axioms (2)
- domain assumption Anatomical variability can be modeled by sampling from a subspace estimated from real data and deforming a template image
- domain assumption Causal relationships can be encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts
Reference graph
Works this paper leans on
-
[1]
Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?
P. Lowenstein and M. Castro, “Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?”Current Gene Therapy, vol. 9, no. 5, pp. 368–374, Oct. 2009. [Online]. Available: http://www.eurekaselect.com/openurl/c ontent.php?genre=article&issn=1566-5232&volume=9&issue=5&spage =368
2009
-
[2]
Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,
G. A. McLeod, E. A. M. Stanley, T. Rosenal, and N. D. Forkert, “Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,”npj Digital Medicine, vol. 9, no. 1, p. 62, Dec. 2025. [Online]. Available: https://www.nature.com/articles/s41746 -025-02226-5
2025
-
[3]
High-performance medicine: the convergence of human and artificial intelligence,
E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,”Nature Medicine, vol. 25, no. 1, pp. 44–56, Jan. 2019. [Online]. Available: https://www.nature.com/articles/s41591 -018-0300-7
2019
-
[4]
Artificial intelligence in medicine: current trends and future possibilities,
V . H. Buch, I. Ahmed, and M. Maruthappu, “Artificial intelligence in medicine: current trends and future possibilities,”British Journal of General Practice, vol. 68, no. 668, pp. 143–144, Mar. 2018. [Online]. Available: https://bjgp.org/lookup/doi/10.3399/bjgp18X695213
-
[5]
Causal Machine Learning for Healthcare and Precision Medicine,
P. Sanchez, J. P. V oisey, T. Xia, H. I. Watson, A. Q. ONeil, and S. A. Tsaftaris, “Causal Machine Learning for Healthcare and Precision Medicine,” 2022, version Number: 2. [Online]. Available: https://arxiv.org/abs/2205.11402
-
[6]
Pearl,Causality
J. Pearl,Causality. Cambridge University Press, 2009
2009
-
[7]
Causality matters in medical imaging,
D. C. Castro, I. Walker, and B. Glocker, “Causality matters in medical imaging,”Nature Communications, vol. 11, no. 1, p. 3673, Jul. 2020. [Online]. Available: https://www.nature.com/articles/s41467-020-17478 -w
2020
-
[8]
A. Komanduri, X. Wu, Y . Wu, and F. Chen, “From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,” 2024. [Online]. Available: https://arxiv.org/abs/2310.11011
-
[9]
Causal Machine Learning: A Survey and Open Problems
J. Kaddour, A. Lynch, Q. Liu, M. J. Kusner, and R. Silva, “Causal Machine Learning: A Survey and Open Problems,” 2022, version Number: 3. [Online]. Available: https://arxiv.org/abs/2206.15475
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,
“Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,” 2024. [Online]. Available: https://www.who.int/news/item/14-03-2024-over-1-in-3-people-affect ed-by-neurological-conditions--the-leading-cause-of-illness-and-disab ility-worldwide
2024
-
[11]
DAGs with NO TEARS: Continuous Optimization for Structure Learning
X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with no tears: Continuous optimization for structure learning,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01422
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
DAG-GNN: DAG Structure Learning with Graph Neural Networks
Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG Structure Learning with Graph Neural Networks,” 2019, version Number: 1. [Online]. Available: https://arxiv.org/abs/1904.10098
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[13]
Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,
K. Bello, B. Aragam, and P. Ravikumar, “Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,” 2023. [Online]. Available: https://arxiv.org/abs/2209.08037
-
[14]
A Survey on Causal Discovery: Theory and Practice,
A. Zanga, E. Ozkirimli, and F. Stella, “A Survey on Causal Discovery: Theory and Practice,”International Journal of Approximate Reasoning, vol. 151, pp. 101–129, Dec. 2022. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0888613X22001402
2022
-
[15]
Database with cause- effect pairs,
Max Planck Institute for Intelligent Systems, “Database with cause- effect pairs,” https://webdav.tuebingen.mpg.de/cause-effect/, 2026, accessed: 2026-01-07
2026
-
[16]
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,
O. Ahmed, F. Tr ¨auble, A. Goyal, A. Neitz, Y . Bengio, B. Sch ¨olkopf, M. W ¨uthrich, and S. Bauer, “CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,” 2020, version Number: 2. [Online]. Available: https://arxiv.org/abs/2010.04296
-
[17]
Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,
K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,”Science, vol. 308, no. 5721, pp. 523–529, Apr. 2005. [Online]. Available: https://www.science.org/doi/10.1126/science.1105 809
-
[18]
Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,
A. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V . Li, L. Peshkin, D. Weitz, and M. Kirschner, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,”Cell, vol. 161, no. 5, pp. 1187–1201, May 2015. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0092867415005000
2015
-
[19]
Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,
A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic, D. Dionne, T. Burks, R. Raychowdhury, B. Adamson, T. M. Norman, E. S. Lander, J. S. Weissman, N. Friedman, and A. Regev, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,”Cell, vol. 167, no. 7, pp. 1853–1866.e17,...
2016
-
[20]
Synthetic data generation methods in healthcare: A review on open-source tools and methods,
V . C. Pezoulas, D. I. Zaridis, E. Mylona, C. Androutsos, K. Apostolidis, N. S. Tachos, and D. I. Fotiadis, “Synthetic data generation methods in healthcare: A review on open-source tools and methods,”Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec
-
[21]
Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005
[Online]. Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005
-
[22]
Causalvae: Structured causal disentanglement in variational autoencoder,
M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Structured causal disentanglement in variational autoencoder,” 2023. [Online]. Available: https://arxiv.org/abs/2004.08697
-
[23]
CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training
M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath, “CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training,” 2017, version Number: 2. [Online]. Available: https://arxiv.org/abs/1709.02023
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
MACAW: A Causal Generative Model for Medical Imaging,
V . Vigneshwaran, E. Ohara, M. Wilms, and N. Forkert, “MACAW: A Causal Generative Model for Medical Imaging,” 2024, version Number:
2024
-
[25]
Available: https://arxiv.org/abs/2412.02900
[Online]. Available: https://arxiv.org/abs/2412.02900
-
[26]
E. A. M. Stanley, M. Wilms, and N. D. Forkert, “A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,” inMedical Image Computing and Computer 10 Assisted Intervention – MICCAI 2023, H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, and R. Taylor, Eds. Cham: Springer Natur...
-
[27]
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,
E. A. M. Stanley, R. Souza, A. J. Winder, V . Gulve, K. Amador, M. Wilms, and N. D. Forkert, “Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,” Journal of the American Medical Informatics Association, vol. 31, no. 11, pp. 2613–2621, Nov. 2024. [Online]. Available: https: //academic.oup.com/jamia/article/3...
2024
-
[28]
E. A. M. Stanley, V . Vigneshwaran, E. Y . Ohara, F. G. Vamosi, N. D. Forkert, and M. Wilms, “Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, J. C. Gee, D. C. Alexander, J. Hong, J. E. Iglesias, C. H. Sudre, A. V...
-
[29]
The SRI24 multichannel atlas of normal adult human brain structure,
T. Rohlfing, N. M. Zahr, E. V . Sullivan, and A. Pfefferbaum, “The SRI24 multichannel atlas of normal adult human brain structure,”Human Brain Mapping, vol. 31, no. 5, pp. 798–819, May 2010. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/hbm.20906
-
[30]
Ixi dataset – brain development,
Biomedical Image Analysis Group, “Ixi dataset – brain development,” https://brain-development.org/ixi-dataset/, 2023, accessed: 2023-02-27
2023
-
[31]
SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,
B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A. V . Dalca, and J. E. Iglesias, “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, p. 102789, May 2023. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S136184152300 0506
2023
-
[32]
Back matter,
J. Nocedal and S. J. Wright, “Back matter,” inNumerical Optimization. New York, NY: Springer, 2006, pp. 598–664
2006
-
[33]
Spirtes, C
P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2000
2000
-
[34]
Optimal structure identification with greedy search,
C. Maxwell, “Optimal structure identification with greedy search,” Journal of Machine Learning Research, Mar. 2003
2003
-
[35]
A linear non-gaussian acyclic model for causal discovery,
S. Shimizu, P. O. Hoyer, A. Hyv ¨arinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discovery,”Journal of Machine Learning Research, Dec. 2006
2006
-
[36]
Review of Causal Discovery Methods Based on Graphical Models,
C. Glymour, K. Zhang, and P. Spirtes, “Review of Causal Discovery Methods Based on Graphical Models,”Frontiers in Genetics, vol. 10, p. 524, Jun. 2019. [Online]. Available: https: //www.frontiersin.org/article/10.3389/fgene.2019.00524/full
-
[37]
Learning Bayesian Networks is NP-Complete,
D. M. Chickering, “Learning Bayesian Networks is NP-Complete,” in Learning from Data, P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger, D. Fisher, and H.-J. Lenz, Eds. New York, NY: Springer New York, 1996, vol. 112, pp. 121– 130, series Title: Lecture Notes in Statistics. [Online]. Available: http://link.springer.com/10.10...
-
[38]
Large-sample learning of bayesian networks is np-hard,
D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of bayesian networks is np-hard,”Journal of Machine Learning Re- search, vol. 5, Jan. 2004
2004
-
[39]
gcastle: A python toolbox for causal discovery.arXiv preprint arXiv:2111.15155, 2021
K. Zhang, S. Zhu, M. Kalander, I. Ng, J. Ye, Z. Chen, and L. Pan, “gCastle: A Python Toolbox for Causal Discovery,” 2021, version Number: 1. [Online]. Available: https://arxiv.org/abs/2111.15155
-
[40]
pgmpy: A Python Toolkit for Bayesian Networks,
A. Ankan and J. Textor, “pgmpy: A Python Toolkit for Bayesian Networks,” 2023, version Number: 1. [Online]. Available: https: //arxiv.org/abs/2304.08639
-
[41]
Large-scale unconstrained optimization,
J. Nocedal and S. J. Wright, “Large-scale unconstrained optimization,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 164–192
2006
-
[42]
Quasi-newton methods,
——, “Quasi-newton methods,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 135–163
2006
-
[43]
Evaluation of 3D Counterfactual Brain MRI Generation,
P. Sun, W. Peng, L. Y . Li, Y . Wang, and K. M. Pohl, “Evaluation of 3D Counterfactual Brain MRI Generation,” 2025, version Number: 2. [Online]. Available: https://arxiv.org/abs/2508.02880
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.