Recognition: no theorem link
3D Conditional Image Synthesis of Left Atrial LGE MRI from Composite Semantic Masks
Pith reviewed 2026-05-16 17:09 UTC · model grok-4.3
The pith
SPADE-LDM synthesis from composite masks raises left atrial segmentation Dice score from 0.908 to 0.936
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors build a synthesis pipeline that converts composite semantic masks—expert anatomical labels plus unsupervised tissue clusters—into 3D LGE MRI volumes. Among the three conditional models, SPADE-LDM produces the most realistic and structurally faithful images (FID 4.063 versus 40.821 and 7.652 for the GAN baselines). When these synthetic volumes are added to the training set, the 3D U-Net achieves a statistically significant Dice improvement from 0.908 to 0.936 on left atrial cavity segmentation.
What carries the argument
SPADE-LDM, the latent diffusion model conditioned on 3D composite semantic label maps that generates the synthetic LGE MRI volumes.
If this is right
- SPADE-LDM substantially outperforms both GAN models on FID, indicating superior realism and structural fidelity.
- Augmenting scarce LGE training data with the generated volumes produces a statistically significant gain in LA cavity segmentation accuracy.
- The composite-mask conditioning lets the generator respect both expert annotations and unsupervised tissue patterns simultaneously.
- Label-conditioned 3D synthesis offers a direct route to mitigate data scarcity for models that quantify atrial fibrosis.
Where Pith is reading between the lines
- The same composite-mask pipeline could be tested on other cardiac chambers or MRI contrasts where annotated volumes remain limited.
- Performance might improve further by tuning the ratio of synthetic to real images or by adding explicit diversity constraints during generation.
- If the gains hold across multi-center datasets, the approach could lower the annotation burden required to build reliable clinical segmentation tools.
Load-bearing premise
The synthetic images must be free of artifacts and distribution shifts that would cause the downstream segmentation model to learn incorrect features instead of true anatomy.
What would settle it
Training the 3D U-Net on real data alone versus real plus synthetic data and finding no Dice improvement or a performance drop on an independent set of real clinical LGE scans would disprove the augmentation benefit.
Figures
read the original abstract
Segmentation of the left atrial (LA) wall and endocardium from late gadolinium-enhanced (LGE) MRI is essential for quantifying atrial fibrosis in patients with atrial fibrillation. The development of accurate machine learning-based segmentation models remains challenging due to the limited availability of data and the complexity of anatomical structures. In this work, we investigate 3D conditional generative models as potential solution for augmenting scarce LGE training data and improving LA segmentation performance. We develop a pipeline to synthesize high-fidelity 3D LGE MRI volumes from composite semantic label maps combining anatomical expert annotations with unsupervised tissue clusters, using three 3D conditional generators (Pix2Pix GAN, SPADE-GAN, and SPADE-LDM). The synthetic images are evaluated for realism and their impact on downstream LA segmentation. SPADE-LDM generates the most realistic and structurally accurate images, achieving an FID of 4.063 and surpassing GAN models, which have FIDs of 40.821 and 7.652 for Pix2Pix and SPADE-GAN, respectively. When augmented with synthetic LGE images, the Dice score for LA cavity segmentation with a 3D U-Net model improved from 0.908 to 0.936, showing a statistically significant improvement (p < 0.05) over the baseline.These findings demonstrate the potential of label-conditioned 3D synthesis to enhance the segmentation of under-represented cardiac structures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a pipeline to synthesize 3D LGE MRI volumes from composite semantic label maps (expert annotations plus unsupervised tissue clusters) using three conditional generators: Pix2Pix GAN, SPADE-GAN, and SPADE-LDM. SPADE-LDM produces the most realistic outputs (FID 4.063 vs. 40.821 and 7.652 for the GAN baselines) and, when used to augment training data, raises 3D U-Net Dice for LA cavity segmentation from 0.908 to 0.936 (p < 0.05).
Significance. If the Dice gain proves robust, the work offers a concrete route to data augmentation for scarce, high-value cardiac LGE MRI datasets. The composite-mask conditioning strategy is a pragmatic engineering contribution that could be adopted by other groups working on under-represented cardiac structures.
major comments (2)
- [Results] The central claim that synthetic augmentation improves generalization rests on a single train/test partition of the internal LGE dataset (Results section). Because the composite masks are derived from the same expert annotations used for training, any distribution overlap between the synthesis training labels and the test set can produce an optimistic Dice gain that may not replicate on a different split; repeated random splits or k-fold evaluation is required to substantiate the p < 0.05 improvement.
- [Methods] The manuscript provides no quantitative controls for distribution shift between real and synthetic images (e.g., no MMD, no domain-adversarial validation, no hold-out scanner/site test). Without these, it remains unclear whether the observed Dice increase reflects genuine anatomical fidelity or merely memorization of the training distribution.
minor comments (2)
- [Abstract] The abstract states the statistical significance but does not name the test (paired t-test, Wilcoxon, etc.) or report the exact number of samples used for the p-value calculation.
- [Results] Figure captions and the main text should explicitly state the number of real vs. synthetic volumes used in each training regime and whether the same random seed or data split was fixed across all compared models.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve the robustness of our evaluation.
read point-by-point responses
-
Referee: The central claim that synthetic augmentation improves generalization rests on a single train/test partition of the internal LGE dataset (Results section). Because the composite masks are derived from the same expert annotations used for training, any distribution overlap between the synthesis training labels and the test set can produce an optimistic Dice gain that may not replicate on a different split; repeated random splits or k-fold evaluation is required to substantiate the p < 0.05 improvement.
Authors: We agree that a single split limits generalizability claims. In the revised manuscript we will add results from five independent random train/test splits, reporting mean Dice scores with standard deviations for the baseline and augmented models. This will strengthen the evidence for the reported improvement from 0.908 to 0.936. revision: yes
-
Referee: The manuscript provides no quantitative controls for distribution shift between real and synthetic images (e.g., no MMD, no domain-adversarial validation, no hold-out scanner/site test). Without these, it remains unclear whether the observed Dice increase reflects genuine anatomical fidelity or merely memorization of the training distribution.
Authors: We acknowledge the lack of explicit distribution-shift metrics. We will add Maximum Mean Discrepancy (MMD) calculations between real and synthetic image feature distributions (using a pre-trained 3D encoder) in the revised Methods and Results. This will provide quantitative support that the Dice gain arises from improved fidelity rather than memorization. revision: partial
- No multi-scanner or multi-site data is available, preventing a hold-out scanner/site test for distribution shift validation.
Circularity Check
Empirical pipeline with external metrics exhibits no circularity
full rationale
The paper presents a purely empirical pipeline: training three 3D conditional generators (Pix2Pix GAN, SPADE-GAN, SPADE-LDM) on composite semantic masks derived from expert annotations plus unsupervised clustering, then measuring realism via FID against real LGE volumes and downstream utility via Dice improvement on a 3D U-Net segmentation task. No mathematical derivations, equations, or self-citations are invoked that reduce any reported result to a fitted parameter or input by construction. All key numbers (FID 4.063 for SPADE-LDM; Dice rise from 0.908 to 0.936) are computed against held-out real data using standard external metrics, so the claims remain self-contained without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Conditional generative models can learn accurate mappings from semantic label maps to realistic image intensities.
Reference graph
Works this paper leans on
-
[1]
Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review,
L. Li, V. A. Zimmer, J. A. Schnabel, and X. Zhuang, “ Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review,” Medical Image Analysis, vol. 77, p. 102360, Apr. 2022, doi: 10.1016/j.media.2022.102360
-
[2]
Z. Xiong et al., “A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging,” Medical Image Analysis, vol. 67, p. 101832, Jan. 2021, doi: 10.1016/j.media.2020.101832
-
[3]
Mini review: Deep learning for atrial segmentation from late gadolinium-enhanced MRIs,
K. Jamart, Z. Xiong, G. D. Maso Talou, M. K. Stiles, and J. Zhao, “Mini review: Deep learning for atrial segmentation from late gadolinium-enhanced MRIs,” Frontiers in Cardiovascular Medicine , vol. 7, p. 522088, May 2020, doi: 10.3389/fcvm.2020.00086/bibtex
-
[4]
H. C. Shin et al., “Medical image synthesis for data augmentation and anonymization using generative adversarial networks ,” in Simulation and Synthesis in Medical Imaging (SASHIMI 2018 ), A. Gooya, O. Goksel, I. Oguz, and N. Burgos, Eds., Lecture Notes in Computer Science(), vol. 11037, Cham: Springer Verlag, 2018, pp. 1 –11. doi: 10.1007/978-3-030-00536-8_1
-
[5]
Optimized automated cardiac MR scar quantification with GAN ‐ based data augmentation,
D. R. P. R. M. Lustermans, S. Amirrajab, M. Veta, M. Breeuwer, and C. M. Scannell, “ Optimized automated cardiac MR scar quantification with GAN ‐ based data augmentation, ” Computer Methods and Programs in Biomedicine, vol. 226, , p. 107116 , Nov. 2022, doi: 10.1016/j.cmpb.2022.107116
-
[6]
Conditional Generative Adversarial Nets
M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, Nov. 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[7]
Image -to-image translation with conditional adversarial networks ,
P. Isola, J. -Y. Zhu, T. Zhou, and A. A. Efros, “Image -to-image translation with conditional adversarial networks ,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1125–1134
work page 2017
-
[8]
Semantic image synthesis with spatially-adaptive normalization,
T. Park, M. Y. Liu, T. C. Wang, and J. Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2332–2341. doi: 10.1109/cvpr.2019.00244
-
[9]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances In Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851
work page 2020
-
[10]
Conditional diffusion models for semantic 3D brain MRI synthesis,
Z. Dorjsembe, H. K. Pao, S. Odonchimed, and F. Xiao, “Conditional diffusion models for semantic 3D brain MRI synthesis,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 7, pp. 4084 –4093, Jul. 2024, doi: 10.1109/jbhi.2024.3385504
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pp
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models ,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Dec. 2022, pp. 10674– 10685. doi: 10.1109/CVPR52688.2022.01042
-
[12]
Semantic image synthesis via diffusion models,
W. Zhou et al., “Semantic image synthesis via diffusion models,” arXiv preprint arXiv:2207.00050, Jun. 2022
-
[13]
T. Ruschke et al., “Guided Synthesis Of Labeled Brain MRI data using latent diffusion models for segmentation of enlarged ventricles,” arXiv preprint arXiv:2411.01351, Nov. 2024
-
[14]
Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis,
L. Zhu et al., “Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis,” in International Conference on Medical Image Computing and Computer -Assisted Intervention - MICCAI 2023, Lecture Notes in Computer Science, vol. 14229, Cham, Switzerland: Springer, Oct. 2023, pp. 592 –601. doi: 10.1007/978 -3- 031-43999-5_56
work page doi:10.1007/978 2023
-
[15]
S. Amirrajab et al., “XCAT -GAN for synthesizing 3D consistent labeled cardiac MR images on anatomically variable XCAT phantoms,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020 , Lecture Notes in Computer Science, vol. 12264, Cham, Switzerland: Springer, Oct. 2020, pp. 128–137. doi: 10.1007/978-3-030-59719-1_13
-
[16]
S. Amirrajab, Y. Al Khalil, C. Lorenz, J. Weese, J. Pluim, and M. Breeuwer, “Label -informed cardiac magnetic resonance image synthesis through conditional generative adversarial networks,” Computerized Medical Imaging and Graphics , vol. 101, p. 102123, Oct. 2022, doi: 10.1016/j.compmedimag.2022.102123
-
[17]
MRI scan synthesis methods based on clustering and Pix2Pix,
G. Baldini, M. Schmidt, C. Zäske, and L. L. Caldeira, “MRI scan synthesis methods based on clustering and Pix2Pix,” in Artificial Intelligence in Medicine – AIME 2024, Lecture Notes in Computer Science, vol. 14845, Cham, Switzerland: Springer, Jul. 2024, pp. 109– 125, doi: 10.1007/978-3-031-66535-6_13
-
[18]
Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle -and shape -consistency generative adversarial network ,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 9242–9251. doi: 10.1109/cvpr.2018.00963
-
[19]
GANs trained by a two time -scale update rule converge to a local Nash equilibrium ,
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time -scale update rule converge to a local Nash equilibrium ,” in Advances in Neural Information Processing Systems, vol. 30, Long Beach, CA, USA, 2017, pp. 6629 –6640. doi: 10.18034/ajase.v8i1.9
-
[20]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two -sample test,” The Journal of Machine Learning Research, vol. 13, pp. 723–773, 2012
work page 2012
-
[21]
Multi -scale structural similarity for image quality assessment,
Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi -scale structural similarity for image quality assessment,” in The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003, pp. 1398 –1402. doi: 10.1109/acssc.2003.1292216
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.