arxiv: 2601.04588 · v2 · submitted 2026-01-08 · 💻 cs.CV

Recognition: no theorem link

3D Conditional Image Synthesis of Left Atrial LGE MRI from Composite Semantic Masks

Yusri Al-Sanaani , Rebecca Thornhill , Sreeraman Rajan

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords LGE MRIleft atriumimage synthesisdata augmentationSPADE-LDM3D segmentationconditional generation

0 comments

The pith

SPADE-LDM synthesis from composite masks raises left atrial segmentation Dice score from 0.908 to 0.936

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether 3D conditional generative models can produce realistic late gadolinium-enhanced MRI volumes of the left atrium from composite semantic label maps, thereby expanding scarce training sets for segmentation. It implements three generators—Pix2Pix GAN, SPADE-GAN, and SPADE-LDM—and measures both image realism via FID and downstream impact on a 3D U-Net segmenter. SPADE-LDM yields the lowest FID of 4.063 and, when its outputs augment the real data, lifts LA cavity Dice from 0.908 to 0.936 with p less than 0.05. The work therefore presents label-conditioned 3D synthesis as a concrete remedy for limited annotated LGE scans needed to quantify atrial fibrosis.

Core claim

The authors build a synthesis pipeline that converts composite semantic masks—expert anatomical labels plus unsupervised tissue clusters—into 3D LGE MRI volumes. Among the three conditional models, SPADE-LDM produces the most realistic and structurally faithful images (FID 4.063 versus 40.821 and 7.652 for the GAN baselines). When these synthetic volumes are added to the training set, the 3D U-Net achieves a statistically significant Dice improvement from 0.908 to 0.936 on left atrial cavity segmentation.

What carries the argument

SPADE-LDM, the latent diffusion model conditioned on 3D composite semantic label maps that generates the synthetic LGE MRI volumes.

If this is right

SPADE-LDM substantially outperforms both GAN models on FID, indicating superior realism and structural fidelity.
Augmenting scarce LGE training data with the generated volumes produces a statistically significant gain in LA cavity segmentation accuracy.
The composite-mask conditioning lets the generator respect both expert annotations and unsupervised tissue patterns simultaneously.
Label-conditioned 3D synthesis offers a direct route to mitigate data scarcity for models that quantify atrial fibrosis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same composite-mask pipeline could be tested on other cardiac chambers or MRI contrasts where annotated volumes remain limited.
Performance might improve further by tuning the ratio of synthetic to real images or by adding explicit diversity constraints during generation.
If the gains hold across multi-center datasets, the approach could lower the annotation burden required to build reliable clinical segmentation tools.

Load-bearing premise

The synthetic images must be free of artifacts and distribution shifts that would cause the downstream segmentation model to learn incorrect features instead of true anatomy.

What would settle it

Training the 3D U-Net on real data alone versus real plus synthetic data and finding no Dice improvement or a performance drop on an independent set of real clinical LGE scans would disprove the augmentation benefit.

Figures

Figures reproduced from arXiv: 2601.04588 by Rebecca Thornhill, Sreeraman Rajan, Yusri Al-Sanaani.

**Figure 1.** Figure 1: Overview of SPADE LDM framework. (A) VAE training with SPADE-conditioned decoding for semantic reconstruction. (B) Latent diffusion guided by semantic maps to generate MRI images. (C) SDE/SDD (Semantic Diffusion Encoder/Decoder) ResBlocks in the diffusion model [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Silhouette Score and Davies–Bouldin Index for k-means, σ=1 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: shows sample output from our composite label map generation pipeline. The composite label maps (C, D) are generated from the original MRI (A) and ground truth masks (B). Though approximate, the auxiliary labels introduced contextual diversity and improved synthesis realism, as shown by refinements from k = 2 to k = 5 in (C, D) [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Sample conditional generation using different label inputs with the SPADE-GAN model. (D) Output conditioned on ground-truth masks (B). (E) Output conditioned on composite semantic map (C), showing closer resemblance to the real MRI (A). SPADE-GAN also performed well, achieving a relatively low FID (7.652) and MMD (4.433), with MS-SSIM (0.811) and PSNR (23.542 dB), surpassing Pix2Pix on all metrics. However… view at source ↗

**Figure 4.** Figure 4: illustrates the progression of visual quality across models. 3D Pix2Pix captures general LA morphology but produces over-smoothed textures and uniform backgrounds. SPADE-GAN improves the anatomical structure and local texture but occasionally introduces non-physiological speckle artifacts. SPADE-LDM preserves wall detail, contrast dynamics, and fine-grained intensity variation resembling acquisition noise.… view at source ↗

read the original abstract

Segmentation of the left atrial (LA) wall and endocardium from late gadolinium-enhanced (LGE) MRI is essential for quantifying atrial fibrosis in patients with atrial fibrillation. The development of accurate machine learning-based segmentation models remains challenging due to the limited availability of data and the complexity of anatomical structures. In this work, we investigate 3D conditional generative models as potential solution for augmenting scarce LGE training data and improving LA segmentation performance. We develop a pipeline to synthesize high-fidelity 3D LGE MRI volumes from composite semantic label maps combining anatomical expert annotations with unsupervised tissue clusters, using three 3D conditional generators (Pix2Pix GAN, SPADE-GAN, and SPADE-LDM). The synthetic images are evaluated for realism and their impact on downstream LA segmentation. SPADE-LDM generates the most realistic and structurally accurate images, achieving an FID of 4.063 and surpassing GAN models, which have FIDs of 40.821 and 7.652 for Pix2Pix and SPADE-GAN, respectively. When augmented with synthetic LGE images, the Dice score for LA cavity segmentation with a 3D U-Net model improved from 0.908 to 0.936, showing a statistically significant improvement (p < 0.05) over the baseline.These findings demonstrate the potential of label-conditioned 3D synthesis to enhance the segmentation of under-represented cardiac structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPADE-LDM from composite masks gives a modest Dice lift for LA segmentation, but the single-split setup makes the gain fragile.

read the letter

The main point is that SPADE-LDM conditioned on composite semantic masks produces the best-looking 3D LGE MRI of the three models tested and raises downstream 3D U-Net Dice for the LA cavity from 0.908 to 0.936 with p less than 0.05. The gain is real on the numbers they report, but it comes from one data partition on an internal set, which is the usual weak point in small medical cohorts. The composite masks themselves—expert annotations mixed with unsupervised tissue clusters—are the practical step that lets them generate richer conditioning without full manual labeling for every structure. They run Pix2Pix GAN, SPADE-GAN, and SPADE-LDM in 3D, show the diffusion model wins on FID by a wide margin, and then measure the augmentation effect on segmentation. That pipeline is straightforward and the metrics are concrete, which is better than many papers that stop at pretty pictures. The comparison across generators is clear and the downstream task is the right one for the clinical use case. The soft spot is exactly the evaluation design the stress test flagged. A single train/test split leaves open the possibility that the unsupervised clusters or the mask generation process created some overlap with the test distribution, so the reported improvement may not hold on a different partition. Medical LGE datasets are rarely large, and without repeated splits or an external test set the result stays vulnerable to partition effects. The abstract also skips details on dataset size, exact split ratios, and training controls, which makes it harder to judge how much of the gain is from the synthesis versus from other choices. This work is aimed at the cardiac MRI segmentation crowd who already deal with scarce LGE data for atrial fibrillation cases. Anyone looking for a concrete augmentation recipe in that niche will find the numbers and the mask construction useful. It deserves peer review because the core pipeline is reproducible in principle and the problem is clinically grounded, even if the claims need tighter validation on robustness.

Referee Report

2 major / 2 minor

Summary. The paper develops a pipeline to synthesize 3D LGE MRI volumes from composite semantic label maps (expert annotations plus unsupervised tissue clusters) using three conditional generators: Pix2Pix GAN, SPADE-GAN, and SPADE-LDM. SPADE-LDM produces the most realistic outputs (FID 4.063 vs. 40.821 and 7.652 for the GAN baselines) and, when used to augment training data, raises 3D U-Net Dice for LA cavity segmentation from 0.908 to 0.936 (p < 0.05).

Significance. If the Dice gain proves robust, the work offers a concrete route to data augmentation for scarce, high-value cardiac LGE MRI datasets. The composite-mask conditioning strategy is a pragmatic engineering contribution that could be adopted by other groups working on under-represented cardiac structures.

major comments (2)

[Results] The central claim that synthetic augmentation improves generalization rests on a single train/test partition of the internal LGE dataset (Results section). Because the composite masks are derived from the same expert annotations used for training, any distribution overlap between the synthesis training labels and the test set can produce an optimistic Dice gain that may not replicate on a different split; repeated random splits or k-fold evaluation is required to substantiate the p < 0.05 improvement.
[Methods] The manuscript provides no quantitative controls for distribution shift between real and synthetic images (e.g., no MMD, no domain-adversarial validation, no hold-out scanner/site test). Without these, it remains unclear whether the observed Dice increase reflects genuine anatomical fidelity or merely memorization of the training distribution.

minor comments (2)

[Abstract] The abstract states the statistical significance but does not name the test (paired t-test, Wilcoxon, etc.) or report the exact number of samples used for the p-value calculation.
[Results] Figure captions and the main text should explicitly state the number of real vs. synthetic volumes used in each training regime and whether the same random seed or data split was fixed across all compared models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve the robustness of our evaluation.

read point-by-point responses

Referee: The central claim that synthetic augmentation improves generalization rests on a single train/test partition of the internal LGE dataset (Results section). Because the composite masks are derived from the same expert annotations used for training, any distribution overlap between the synthesis training labels and the test set can produce an optimistic Dice gain that may not replicate on a different split; repeated random splits or k-fold evaluation is required to substantiate the p < 0.05 improvement.

Authors: We agree that a single split limits generalizability claims. In the revised manuscript we will add results from five independent random train/test splits, reporting mean Dice scores with standard deviations for the baseline and augmented models. This will strengthen the evidence for the reported improvement from 0.908 to 0.936. revision: yes
Referee: The manuscript provides no quantitative controls for distribution shift between real and synthetic images (e.g., no MMD, no domain-adversarial validation, no hold-out scanner/site test). Without these, it remains unclear whether the observed Dice increase reflects genuine anatomical fidelity or merely memorization of the training distribution.

Authors: We acknowledge the lack of explicit distribution-shift metrics. We will add Maximum Mean Discrepancy (MMD) calculations between real and synthetic image feature distributions (using a pre-trained 3D encoder) in the revised Methods and Results. This will provide quantitative support that the Dice gain arises from improved fidelity rather than memorization. revision: partial

standing simulated objections not resolved

No multi-scanner or multi-site data is available, preventing a hold-out scanner/site test for distribution shift validation.

Circularity Check

0 steps flagged

Empirical pipeline with external metrics exhibits no circularity

full rationale

The paper presents a purely empirical pipeline: training three 3D conditional generators (Pix2Pix GAN, SPADE-GAN, SPADE-LDM) on composite semantic masks derived from expert annotations plus unsupervised clustering, then measuring realism via FID against real LGE volumes and downstream utility via Dice improvement on a 3D U-Net segmentation task. No mathematical derivations, equations, or self-citations are invoked that reduce any reported result to a fitted parameter or input by construction. All key numbers (FID 4.063 for SPADE-LDM; Dice rise from 0.908 to 0.936) are computed against held-out real data using standard external metrics, so the claims remain self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard assumptions of conditional generative models being able to learn label-to-image mappings without introducing new free parameters or entities.

axioms (1)

domain assumption Conditional generative models can learn accurate mappings from semantic label maps to realistic image intensities.
Implicit in the training of Pix2Pix, SPADE-GAN, and SPADE-LDM.

pith-pipeline@v0.9.0 · 5567 in / 1153 out tokens · 47073 ms · 2026-05-16T17:09:26.347631+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review,

L. Li, V. A. Zimmer, J. A. Schnabel, and X. Zhuang, “ Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review,” Medical Image Analysis, vol. 77, p. 102360, Apr. 2022, doi: 10.1016/j.media.2022.102360

work page doi:10.1016/j.media.2022.102360 2022
[2]

A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging,

Z. Xiong et al., “A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging,” Medical Image Analysis, vol. 67, p. 101832, Jan. 2021, doi: 10.1016/j.media.2020.101832

work page doi:10.1016/j.media.2020.101832 2021
[3]

Mini review: Deep learning for atrial segmentation from late gadolinium-enhanced MRIs,

K. Jamart, Z. Xiong, G. D. Maso Talou, M. K. Stiles, and J. Zhao, “Mini review: Deep learning for atrial segmentation from late gadolinium-enhanced MRIs,” Frontiers in Cardiovascular Medicine , vol. 7, p. 522088, May 2020, doi: 10.3389/fcvm.2020.00086/bibtex

work page doi:10.3389/fcvm.2020.00086/bibtex 2020
[4]

Medical image synthesis for data augmentation and anonymization using generative adversarial networks ,

H. C. Shin et al., “Medical image synthesis for data augmentation and anonymization using generative adversarial networks ,” in Simulation and Synthesis in Medical Imaging (SASHIMI 2018 ), A. Gooya, O. Goksel, I. Oguz, and N. Burgos, Eds., Lecture Notes in Computer Science(), vol. 11037, Cham: Springer Verlag, 2018, pp. 1 –11. doi: 10.1007/978-3-030-00536-8_1

work page doi:10.1007/978-3-030-00536-8_1 2018
[5]

Optimized automated cardiac MR scar quantification with GAN ‐ based data augmentation,

D. R. P. R. M. Lustermans, S. Amirrajab, M. Veta, M. Breeuwer, and C. M. Scannell, “ Optimized automated cardiac MR scar quantification with GAN ‐ based data augmentation, ” Computer Methods and Programs in Biomedicine, vol. 226, , p. 107116 , Nov. 2022, doi: 10.1016/j.cmpb.2022.107116

work page doi:10.1016/j.cmpb.2022.107116 2022
[6]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, Nov. 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Image -to-image translation with conditional adversarial networks ,

P. Isola, J. -Y. Zhu, T. Zhou, and A. A. Efros, “Image -to-image translation with conditional adversarial networks ,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1125–1134

work page 2017
[8]

Semantic image synthesis with spatially-adaptive normalization,

T. Park, M. Y. Liu, T. C. Wang, and J. Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2332–2341. doi: 10.1109/cvpr.2019.00244

work page doi:10.1109/cvpr.2019.00244 2019
[9]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances In Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851

work page 2020
[10]

Conditional diffusion models for semantic 3D brain MRI synthesis,

Z. Dorjsembe, H. K. Pao, S. Odonchimed, and F. Xiao, “Conditional diffusion models for semantic 3D brain MRI synthesis,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 7, pp. 4084 –4093, Jul. 2024, doi: 10.1109/jbhi.2024.3385504

work page doi:10.1109/jbhi.2024.3385504 2024
[11]

In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pp

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models ,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Dec. 2022, pp. 10674– 10685. doi: 10.1109/CVPR52688.2022.01042

work page doi:10.1109/cvpr52688.2022.01042 2022
[12]

Semantic image synthesis via diffusion models,

W. Zhou et al., “Semantic image synthesis via diffusion models,” arXiv preprint arXiv:2207.00050, Jun. 2022

work page arXiv 2022
[13]

Guided Synthesis Of Labeled Brain MRI data using latent diffusion models for segmentation of enlarged ventricles,

T. Ruschke et al., “Guided Synthesis Of Labeled Brain MRI data using latent diffusion models for segmentation of enlarged ventricles,” arXiv preprint arXiv:2411.01351, Nov. 2024

work page arXiv 2024
[14]

Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis,

L. Zhu et al., “Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis,” in International Conference on Medical Image Computing and Computer -Assisted Intervention - MICCAI 2023, Lecture Notes in Computer Science, vol. 14229, Cham, Switzerland: Springer, Oct. 2023, pp. 592 –601. doi: 10.1007/978 -3- 031-43999-5_56

work page doi:10.1007/978 2023
[15]

XCAT -GAN for synthesizing 3D consistent labeled cardiac MR images on anatomically variable XCAT phantoms,

S. Amirrajab et al., “XCAT -GAN for synthesizing 3D consistent labeled cardiac MR images on anatomically variable XCAT phantoms,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020 , Lecture Notes in Computer Science, vol. 12264, Cham, Switzerland: Springer, Oct. 2020, pp. 128–137. doi: 10.1007/978-3-030-59719-1_13

work page doi:10.1007/978-3-030-59719-1_13 2020
[16]

Label -informed cardiac magnetic resonance image synthesis through conditional generative adversarial networks,

S. Amirrajab, Y. Al Khalil, C. Lorenz, J. Weese, J. Pluim, and M. Breeuwer, “Label -informed cardiac magnetic resonance image synthesis through conditional generative adversarial networks,” Computerized Medical Imaging and Graphics , vol. 101, p. 102123, Oct. 2022, doi: 10.1016/j.compmedimag.2022.102123

work page doi:10.1016/j.compmedimag.2022.102123 2022
[17]

MRI scan synthesis methods based on clustering and Pix2Pix,

G. Baldini, M. Schmidt, C. Zäske, and L. L. Caldeira, “MRI scan synthesis methods based on clustering and Pix2Pix,” in Artificial Intelligence in Medicine – AIME 2024, Lecture Notes in Computer Science, vol. 14845, Cham, Switzerland: Springer, Jul. 2024, pp. 109– 125, doi: 10.1007/978-3-031-66535-6_13

work page doi:10.1007/978-3-031-66535-6_13 2024
[18]

Translating and segmenting multimodal medical volumes with cycle -and shape -consistency generative adversarial network ,

Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle -and shape -consistency generative adversarial network ,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 9242–9251. doi: 10.1109/cvpr.2018.00963

work page doi:10.1109/cvpr.2018.00963 2018
[19]

GANs trained by a two time -scale update rule converge to a local Nash equilibrium ,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time -scale update rule converge to a local Nash equilibrium ,” in Advances in Neural Information Processing Systems, vol. 30, Long Beach, CA, USA, 2017, pp. 6629 –6640. doi: 10.18034/ajase.v8i1.9

work page doi:10.18034/ajase.v8i1.9 2017
[20]

A kernel two -sample test,

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two -sample test,” The Journal of Machine Learning Research, vol. 13, pp. 723–773, 2012

work page 2012
[21]

Multi -scale structural similarity for image quality assessment,

Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi -scale structural similarity for image quality assessment,” in The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003, pp. 1398 –1402. doi: 10.1109/acssc.2003.1292216

work page doi:10.1109/acssc.2003.1292216 2003