pith. machine review for the scientific record. sign in

arxiv: 2605.13686 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.AI

Recognition: unknown

Cross Modality Image Translation In Medical Imaging Using Generative Frameworks

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords image-to-image translationGANmedical imaging3D synthesisoncologyCT to PETdiffusion modelsvisual Turing test
0
0 comments X

The pith

GANs outperform latent models in standardized 3D medical image translation across 11 oncology datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a uniform testing setup for models that convert one medical scan type into another, such as CT to PET, using 3D volumes instead of 2D slices. It runs 77 experiments comparing three GAN-based methods against four latent generative approaches on scans from head, lung, and pelvis regions. GANs produce higher quality results overall, with SRGAN showing clear statistical edges, while all models have trouble with tiny lesions and PET intensity values. A test with 17 physicians finds they cannot reliably distinguish the generated images from real ones.

Core claim

Under identical preprocessing, splitting, training, and evaluation conditions, generative adversarial networks consistently exceed the performance of latent generative models in cross-modality 3D image synthesis for oncology, with SRGAN achieving statistically significant superiority; lesion-level breakdowns indicate reliable shape preservation but weaker handling of small structures and absolute uptake intensities in CT-to-PET tasks, and a visual Turing test with physicians yields 56.7 percent classification accuracy.

What carries the argument

The standardized comparative evaluation framework that enforces uniform preprocessing, data splits, inference rules, and multi-level metrics including lesion analysis and visual Turing tests across 77 experiments on 11 datasets.

If this is right

  • SRGAN becomes the default starting point for virtual scanning pipelines in head/neck, lung, and pelvis oncology.
  • All synthesis methods require targeted improvements for small-lesion fidelity and PET uptake accuracy.
  • Standardized 3D benchmarks replace isolated 2D task evaluations to enable fair model comparisons.
  • Clinical workflows can incorporate synthetic volumes once perceptual tests confirm indistinguishability from real acquisitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Reducing the need for multiple physical scans could lower patient radiation dose and scan time in routine oncology follow-up.
  • The gap between quantitative metrics and physician preference points to a need for perceptual loss terms or clinician-in-the-loop training.
  • Hybrid architectures that combine adversarial training with diffusion-style stability may close the remaining performance differences on small structures.

Load-bearing premise

Uniform preprocessing, splitting, and inference rules applied to heterogeneous datasets and modalities do not inadvertently favor GAN architectures over latent models.

What would settle it

Retraining the latent models on the same eleven datasets with hyperparameters and augmentation choices tuned specifically for them, then re-running the full lesion-level and physician evaluation, would show whether they can match or exceed GAN scores.

Figures

Figures reproduced from arXiv: 2605.13686 by Alessia Capoccia, Ana Isabel Hern\'aiz Ferrer, Arturo Chiti, Bradley J. Erickson, Deborah Fazzini, Fabrizia Gelardi, Fatemeh Darvizeh, Filippo Ruffini, Francesco Di Feola, Francesco Gossetti, Giulia Romoli, Katrine Riklund, Liu Fang, Luca Boldrini, Marcello Di Pumpo, Michail E. Klontzas, Paola Feraco, Paolo Soda, Renato Cuocolo, Sara N. Strandberg, Seyedmehdi Payabvash, Tugba Akinci D'Antonoli, Valerio Guarrasi.

Figure 1
Figure 1. Figure 1: I2I translation tasks. Overview of paired I2I translation tasks selected for this study, grouped by anatomical region (lung, A; head/neck, B; and pelvis, C). Triangle vertices represent the three imaging modalities (CT, MRI, and PET). Inter-modality translations are represented by arrows between vertices, while intra-modality ones are indicated by self-loops. Arrow colors are assigned based on clinical rel… view at source ↗
Figure 2
Figure 2. Figure 2: The proposed benchmark experiments. Each of the 11 dataset configurations (left) is evaluated against all 7 generative models (centre) using 2 evaluation metrics (right), yielding 77 experimental combinations in total. selected as widely adopted for I2I translation in the medical imaging literature. Pix2Pix and CycleGAN are GAN baselines in the vast majority of medical I2I studies [3], while SRGAN is repre… view at source ↗
Figure 3
Figure 3. Figure 3: Quantitative performance. Radar charts (PSNR on the right and SSIM on the left) comparing seven I2I synthesis models across eleven task-anatomy configurations. on EnhancePET down to 0.57 on Synthrad25 MRI-to-CT (lung), whereas CycleGAN exhibits a narrower range across the same tasks (0.94 to 0.66). Latent generative models generally fall below their GAN counterparts, with the gap being most pronounced on s… view at source ↗
Figure 4
Figure 4. Figure 4: Error maps. Visual comparison across I2I translation tasks, for the two best-performing GAN-based (SRGAN and CycleGAN) and latent generative models (BBridge and FlowM). For each task, we display the target and input images (first column, first and second row respectively); the corresponding model predictions (first row); and the associated error maps with respect to the reference target (second row), compu… view at source ↗
Figure 5
Figure 5. Figure 5: Lesion analysis from BraTS23. PSNR and SSIM vs lesion size group for the MRI T2w-to-T2f task (BraTS dataset, median lesion diameter: 51.2 mm, IQR: 37.9–62.3 mm) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Lesion analysis from autoPET. PSNR and SSIM vs lesion size group for the CT-to-PET task (autoPET dataset, median lesion diameter: 19.3 mm, IQR: 15.2–30.3 mm). Figures 5 and 6 report PSNR and SSIM as a function of lesion size for the two datasets, respectively. In the BraTS23 dataset ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Summary of results from Visual Turing test, Part 1. Summary of classification performance in Visual Turing test, Part 1. Each column reports the rate of correctly (blue) and incorrectly (red) classified images, separately for real and AI-generated cases (top and bottom row, respectively). Best: physician with the highest balanced accuracy (R3). Worst (Real) and Worst (AI-gen): physicians with the lowest ac… view at source ↗
Figure 8
Figure 8. Figure 8: Results from Visual Turing test, Part 2. Pairwise preference results for GAN models (left) and latent generative models (right), for each task and as an overall aggregate ("Average") [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Results from Visual Turing test, Part 3. Three-way ranking results (Visual Turing test, Part 3). Each panel reports the percentage of rank 1 (most realistic), rank 2, and rank 3 (least realistic) assignments across all triplets, aggregated over all tasks and physicians. throughout the test. In contrast, in the T2w-to-T2f triplet, only 17.6% of readers were fooled, with the large majority correctly assignin… view at source ↗
Figure 10
Figure 10. Figure 10: Overview of the pre-processing pipeline. Each volume passes through eight sequential steps: body masking; voxel resampling; clipping; intensity normalization; spatial padding; foreground mask computation; mask intersection to obtain the common anatomical region of interest; and patch extraction. Cropping was performed exclusively along the axial axis: the inferior and superior extents of the lung mask wer… view at source ↗
Figure 11
Figure 11. Figure 11: Overview of the proposed benchmarking framework. The pipeline consists of four stages: (1) a configuration module, where the user specifies data, model, and training parameters and the dataset is split into training and test sets (75%–25%); (2) a data pipeline, which applies a sequence of preprocessing steps to produce paired source–target volumes; (3) a training pipeline, where GAN-based models operate i… view at source ↗
Figure 12
Figure 12. Figure 12: The Visual Turing test platform. Each volume was displayed through a multi-planar viewer rendered by a grid layout providing three orthogonal anatomical planes (axial, sagittal, and coronal) alongside a 3D surface reconstruction. In Part 1, a single volume was displayed and participants were asked to classify the image as either Real or AI-generated using two mutually exclusive buttons positioned below th… view at source ↗
read the original abstract

Medical image-to-image (I2I) translation enables virtual scanning, i.e. the synthesis of a target imaging modality from a source one without additional acquisitions. Despite growing interest, most proposed methods operate on 2D slices, are evaluated on isolated tasks with different experimental set-ups and lack clinical validation. The primary contribution of this work is a reproducible, standardized comparative evaluation of 3D I2I translation methods in oncological imaging, designed to standardize preprocessing, splitting, inference, and multi-level evaluation across heterogeneous clinical tasks. Within this framework, we compare seven generative models, three Generative Adversarial Networks (GANs: Pix2Pix, CycleGAN, SRGAN) and four latent generative models (Latent Diffusion Model, Latent Diffusion Model+ControlNet, Brownian Bridge, Flow Matching), across eleven datasets spanning three anatomical regions (head/neck, lung, pelvis) and four translation directions (cone-beam CT to CT, MRI to CT, CT to PET, MRI T2-weighted to T2-FLAIR), for a total of 77 experiments under uniform training, inference, and evaluation conditions. The results show that GANs outperform latent generative models across all tasks, with SRGAN achieving statistically significant superiority. Our lesion-level analysis reveals that all models struggle with small lesions and that, in CT to PET synthesis, models reproduce lesion shape more reliably than absolute uptake-related intensity. We also performed a Visual Turing test administered to 17 physicians, including 15 radiologists, which shows near-chance classification accuracy (56.7%), confirming that synthetic volumes are largely indistinguishable from real acquisitions, while exposing a dissociation between quantitative metrics and clinical preference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a standardized comparative evaluation of seven 3D generative models for cross-modality image-to-image translation in oncological imaging. It compares three GANs (Pix2Pix, CycleGAN, SRGAN) and four latent models (LDM, LDM+ControlNet, Brownian Bridge, Flow Matching) across eleven datasets spanning head/neck, lung, and pelvis regions and four translation directions, for a total of 77 experiments under uniform preprocessing, splitting, training, and inference protocols. Results indicate GANs outperform latent models with SRGAN achieving statistically significant superiority; lesion-level analysis shows struggles with small lesions and better shape than intensity reproduction in CT-to-PET; a visual Turing test with 17 physicians yields 56.7% accuracy, indicating synthetic volumes are largely indistinguishable from real acquisitions.

Significance. If the results hold, this work delivers a reproducible benchmark for 3D medical I2I translation by enforcing consistent experimental conditions across heterogeneous tasks and modalities. The scale (77 experiments), inclusion of statistical tests, lesion-specific breakdowns, and physician visual Turing test provide concrete empirical grounding and clinical relevance that could inform model selection and highlight persistent challenges such as small-lesion fidelity and PET uptake accuracy.

major comments (1)
  1. [Experimental Setup] Experimental Setup: The central claim that GANs (particularly SRGAN) outperform latent generative models rests on a single shared preprocessing, splitting, and training recipe applied uniformly to all models. While this protocol enables direct comparability, it may systematically favor GAN architectures, which often converge reliably under standard medical intensity normalization and short schedules, whereas latent diffusion and flow models frequently require longer training, modality-specific noise schedules, or augmentations. The manuscript should explicitly discuss whether per-model hyperparameter optimization was considered and, if not, justify why the uniform protocol is the appropriate basis for ranking intrinsic capabilities rather than protocol compatibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our work. We address the single major comment below and will revise the manuscript accordingly to strengthen the discussion of our experimental design.

read point-by-point responses
  1. Referee: [Experimental Setup] Experimental Setup: The central claim that GANs (particularly SRGAN) outperform latent generative models rests on a single shared preprocessing, splitting, and training recipe applied uniformly to all models. While this protocol enables direct comparability, it may systematically favor GAN architectures, which often converge reliably under standard medical intensity normalization and short schedules, whereas latent diffusion and flow models frequently require longer training, modality-specific noise schedules, or augmentations. The manuscript should explicitly discuss whether per-model hyperparameter optimization was considered and, if not, justify why the uniform protocol is the appropriate basis for ranking intrinsic capabilities rather than protocol compatibility.

    Authors: We appreciate the referee's observation on this key design choice. The uniform protocol was intentionally selected as the core of our contribution: to deliver a reproducible benchmark that enables direct, apples-to-apples comparison of the seven models under identical preprocessing, splitting, training schedules, and inference conditions across 77 experiments. Per-model hyperparameter optimization was deliberately not performed, because doing so would have broken the standardization that allows us to attribute performance differences to the architectures themselves rather than to unequal tuning effort. This setup mirrors a realistic clinical or research scenario in which practitioners apply a single, practical recipe across heterogeneous models. We fully acknowledge that the reported rankings reflect performance under this shared protocol and may not represent the absolute best achievable results for each model with extensive, architecture-specific tuning (e.g., longer diffusion schedules or modality-specific augmentations). We will revise the manuscript to add an explicit paragraph in the Experimental Setup and a dedicated limitations subsection that states this caveat and justifies the uniform protocol as the appropriate basis for ranking relative capabilities under consistent, reproducible conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison under fixed protocols

full rationale

The paper conducts a standardized empirical evaluation of seven existing generative models (Pix2Pix, CycleGAN, SRGAN, LDM, LDM+ControlNet, Brownian Bridge, Flow Matching) across 77 experiments on eleven datasets. No derivations, equations, or predictions are claimed that reduce reported metrics to fitted parameters or self-defined quantities by construction. Performance numbers arise from direct inference on held-out splits using uniform preprocessing and evaluation rules; statistical significance is computed from these independent runs. Any self-citations refer only to the original model papers and do not load-bear the comparative claims. The work is self-contained against external benchmarks and exhibits no self-definitional, fitted-input, or uniqueness-imported circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The comparison rests on standard deep-learning training assumptions and the representativeness of the selected clinical datasets; no new entities or ad-hoc constants are introduced by the paper itself.

axioms (1)
  • domain assumption Standard assumptions in supervised and unsupervised training of generative models hold under the uniform protocol
    Invoked when claiming model superiority from training under identical conditions

pith-pipeline@v0.9.0 · 5718 in / 1282 out tokens · 42024 ms · 2026-05-14T20:14:32.721527+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · 2 internal anchors

  1. [1]

    WHO, WHO compendium of innovative health technologies for low-resource settings 2024, World Health Organization WHO, 2024

  2. [2]

    Kjelle, et al., Cost of low-value imaging worldwide: a systematic review, Applied health economics and health policy 22 (2024) 485

    E. Kjelle, et al., Cost of low-value imaging worldwide: a systematic review, Applied health economics and health policy 22 (2024) 485

  3. [3]

    Dayarathna, et al., Deep learning-based synthesis of MRI, CT and PET: Review and analysis, Computer Methods and Programs in Biomedicine 257 (2024) 108173

    S. Dayarathna, et al., Deep learning-based synthesis of MRI, CT and PET: Review and analysis, Computer Methods and Programs in Biomedicine 257 (2024) 108173

  4. [4]

    Doan, et al., Bridging modalities with ai: a review of ai advances in multimodal biomedical imaging, Communications Engineering 5 (2026) 30

    L. Doan, et al., Bridging modalities with ai: a review of ai advances in multimodal biomedical imaging, Communications Engineering 5 (2026) 30

  5. [5]

    Sherwani, S

    M. Sherwani, S. Gopalakrishnan, A systematic literature review: deep learning techniques for synthetic medical image generation and their applications in radiotherapy, Frontiers in Radiology 4 (2024) 1385742

  6. [6]

    X.Fu,etal., Asystematicreviewofgenerativeartificialintelligencetechniquesforsyntheticmedicalimagedatasets:Quality,models,public availability and applications, Computer Methods and Programs in Biomedicine (2026) 109331

  7. [7]

    Rofena, et al., Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence, Journal of Biomedical Informatics (2025) 104971

    A. Rofena, et al., Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence, Journal of Biomedical Informatics (2025) 104971

  8. [8]

    Kazeminia, et al., GANs for medical image analysis, Artificial Intelligence in Medicine 109 (2020) 101938

    S. Kazeminia, et al., GANs for medical image analysis, Artificial Intelligence in Medicine 109 (2020) 101938

  9. [10]

    Bredell, et al., Explicitly minimizing the blur error of variational autoencoders, in: The Eleventh International Conference on Learning Representations, 2023, pp

    G. Bredell, et al., Explicitly minimizing the blur error of variational autoencoders, in: The Eleventh International Conference on Learning Representations, 2023, pp. 1–16

  10. [11]

    1125–1134

    P.Isola,etal., Image-to-imagetranslationwithconditionaladversarialnetworks, in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, 2017, pp. 1125–1134

  11. [12]

    Zhu, et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp

    J. Zhu, et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232

  12. [13]

    Nie, et al., Medical image synthesis with deep convolutional adversarial networks, IEEE Transactions on Biomedical Engineering 65 (2018) 2720–2730

    D. Nie, et al., Medical image synthesis with deep convolutional adversarial networks, IEEE Transactions on Biomedical Engineering 65 (2018) 2720–2730

  13. [14]

    Wolterink, et al., Deep MR to CT synthesis using unpaired data, in: International Workshop on Simulation and Synthesis in Medical Imaging, Springer, 2017, pp

    J. Wolterink, et al., Deep MR to CT synthesis using unpaired data, in: International Workshop on Simulation and Synthesis in Medical Imaging, Springer, 2017, pp. 14–23

  14. [15]

    Y. Liu, et al., Magnetic resonance image synthesis from brain computed tomography images based on deep learning methods for magnetic resonance-guided radiotherapy, Quantitative Imaging in Medicine and Surgery 10 (2020) 1358

  15. [16]

    Romoli et al.:Preprint submitted to ElsevierPage 19 of 32 I2I in Medical Imaging

    S.Dar,etal., Imagesynthesisinmulti-contrastMRIwithconditionalgenerativeadversarialnetworks, IEEETransactionsonMedicalImaging 38 (2019) 2375–2388. Romoli et al.:Preprint submitted to ElsevierPage 19 of 32 I2I in Medical Imaging

  16. [17]

    A.Chartsias,etal., MultimodalMRsynthesisviamodality-invariantlatentrepresentation, IEEETransactionsonMedicalImaging37(2018) 803–814

  17. [18]

    Yu, et al., Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis, IEEE Transactions on Medical Imaging 38 (2019) 1750–1762

    B. Yu, et al., Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis, IEEE Transactions on Medical Imaging 38 (2019) 1750–1762

  18. [19]

    Poonkodi, M

    S. Poonkodi, M. Kanchana, 3D-MedTranCSGAN: 3D medical image transformation using CSGAN, Computers in Biology and Medicine 153 (2023) 106541

  19. [20]

    V.Guarrasi,etal., Whole-bodyimage-to-imagetranslationforavirtualscannerinahealthcaredigitaltwin, in:ProceedingsoftheIEEE38th International Symposium on Computer-Based Medical Systems (CBMS), 2025, pp. 528–534

  20. [21]

    4342–4351

    J.Ha,etal., Multi-resolutionguided3DGANsformedicalimagetranslation, in:IEEE/CVFWinterConferenceonApplicationsofComputer Vision (WACV), 2025, pp. 4342–4351

  21. [22]

    J.Ho,A.Jain,P.Abbeel, Denoisingdiffusionprobabilisticmodels, Advancesinneuralinformationprocessingsystems33(2020)6840–6851

  22. [23]

    8780–8794

    P.Dhariwal,A.Nichol, DiffusionmodelsbeatGANsonimagesynthesis, in:AdvancesinNeuralInformationProcessingSystems,volume34, 2021, pp. 8780–8794

  23. [24]

    Kazerouni, et al., Diffusion models in medical imaging: A comprehensive survey, Medical Image Analysis 88 (2023) 102846

    A. Kazerouni, et al., Diffusion models in medical imaging: A comprehensive survey, Medical Image Analysis 88 (2023) 102846

  24. [25]

    A. Moschetto, et al., Benchmarking gans, diffusion models, and flow matching for t1w-to-t2w mri translation, in: International Conference on Image Analysis and Processing, Springer, 2025, pp. 429–440

  25. [26]

    Q.Bertrand,A.Gagneux,M.Massias,R.Emonet,Ontheclosed-formofflowmatching:Generalizationdoesnotarisefromtargetstochasticity, arXiv preprint arXiv:2506.03719 (2025)

  26. [27]

    Akbar, W

    M. Akbar, W. Wang, A. Eklund, Beware of diffusion models for synthesizing medical images – a comparison with GANs in terms of memorizing brain MRI and chest x-ray images, Machine Learning: Science and Technology 6 (2025) 015022

  27. [28]

    Pan, et al., Synthetic CT generation from MRI using 3D transformer-based denoising diffusion model, Medical Physics 51 (2024) 2538– 2548

    S. Pan, et al., Synthetic CT generation from MRI using 3D transformer-based denoising diffusion model, Medical Physics 51 (2024) 2538– 2548

  28. [29]

    K. Choo, Y. Jun, M. Yun, S. Hwang, Slice-consistent 3D volumetric brain CT-to-MRI translation with 2D Brownian bridge diffusion model, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, Springer, 2024, pp. 657–667

  29. [30]

    X.Zhu,etal., Introducing3Drepresentationfordensevolume-to-volumetranslationviascorefusion, in:InternationalConferenceonMachine Learning, 2025, pp. 1–22

  30. [31]

    J. Kim, H. Park, Adaptive latent diffusion model for 3D medical image to image translation: Multi-modal magnetic resonance imaging study, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7604–7613

  31. [32]

    12873–12883

    P.Esser,R.Rombach,B.Ommer, Tamingtransformersforhigh-resolutionimagesynthesis, in:ProceedingsoftheIEEE/CVFConferenceon Computer Vision and Pattern Recognition, 2021, pp. 12873–12883

  32. [33]

    8778–8786

    A.Sargood,etal., CoCoLIT:ControlNet-conditionedlatentimagetranslationforMRItoamyloidPETsynthesis, in:ProceedingsoftheAAAI Conference on Artificial Intelligence, volume 40, 2026, pp. 8778–8786

  33. [34]

    Zhang, A

    L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image diffusion models, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023, pp. 3836–3847

  34. [35]

    Graf, et al., Denoising diffusion-based MRI to CT image translation enables automated spinal segmentation, European Radiology Experimental 7 (2023) 70

    R. Graf, et al., Denoising diffusion-based MRI to CT image translation enables automated spinal segmentation, European Radiology Experimental 7 (2023) 70

  35. [36]

    Rajagopal, et al., Synthetic PET via domain translation of 3-D MRI, IEEE Transactions on Radiation and Plasma Medical Sciences 7 (2023) 333–343

    A. Rajagopal, et al., Synthetic PET via domain translation of 3-D MRI, IEEE Transactions on Radiation and Plasma Medical Sciences 7 (2023) 333–343

  36. [37]

    M. Bahloul, et al., Advancements in synthetic CT generation from MRI: A review of techniques and trends in radiation therapy planning, Journal of Applied Clinical Medical Physics (2024)

  37. [38]

    Thummerer, et al., Synthrad2025 grand challenge dataset: Generating synthetic cts for radiotherapy from head to abdomen, Medical Physics 52 (2025) e17981

    A. Thummerer, et al., Synthrad2025 grand challenge dataset: Generating synthetic cts for radiotherapy from head to abdomen, Medical Physics 52 (2025) e17981

  38. [39]

    F. Bray, et al., Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: A Cancer Journal for Clinicians 74 (2024) 229–263

  39. [40]

    Siegel, et al., Cancer statistics, 2026, CA: A Cancer Journal for Clinicians 76 (2026)

    R. Siegel, et al., Cancer statistics, 2026, CA: A Cancer Journal for Clinicians 76 (2026)

  40. [41]

    Karimi, et al., Glioblastoma: Clinical presentation, multidisciplinary management, and long-term outcomes, Cancers 17 (2025)

    S. Karimi, et al., Glioblastoma: Clinical presentation, multidisciplinary management, and long-term outcomes, Cancers 17 (2025)

  41. [42]

    A. Thummerer, et al., SynthRAD2023 grand challenge dataset: Generating synthetic CT for radiotherapy, in: Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI) Challenges, 2023, pp. 4664–4674

  42. [43]

    Kazerooni, et al., The ASNR-MICCAI brain tumor segmentation (BraTS) challenge 2023: Intracranial meningioma, in: Proceedings of MICCAI, 2023, pp

    A. Kazerooni, et al., The ASNR-MICCAI brain tumor segmentation (BraTS) challenge 2023: Intracranial meningioma, in: Proceedings of MICCAI, 2023, pp. 1–11

  43. [44]

    Gatidis, et al., A whole-body FDG-PET/CT dataset with manually annotated tumor lesions, Scientific Data 9 (2022) 601

    S. Gatidis, et al., A whole-body FDG-PET/CT dataset with manually annotated tumor lesions, Scientific Data 9 (2022) 601

  44. [45]

    Ferrara, et al., Sharing a whole-/total-body [18f] fdg-pet/ct dataset with ct-derived segmentations: an enhance

    D. Ferrara, et al., Sharing a whole-/total-body [18f] fdg-pet/ct dataset with ct-derived segmentations: an enhance. pet initiative, Scientific Data (2026)

  45. [46]

    Saharia, et al., Palette: Image-to-image diffusion models, in: ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp

    C. Saharia, et al., Palette: Image-to-image diffusion models, in: ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10

  46. [47]

    B. Li, K. Xue, B. Liu, Y. Lai, BBDM: Image-to-image translation with Brownian bridge diffusion models, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1952–1961

  47. [48]

    Y.Lipman,R.Chen,H.Ben-Hamu,M.Nickel,M.Le, Flowmatchingforgenerativemodeling, in:ProceedingsoftheInternationalConference on Learning Representations (ICLR), 2023, pp. 1–28

  48. [49]

    Valls, P

    M. Valls, P. Bourdon, C. Fernandez, G. Herpe, D. Helbert, Prob-bbdm: A probabilistic brownian bridge diffusion model for mri sequence image-to-image translation, Computerized Medical Imaging and Graphics (2026) 102745

  49. [50]

    M.Yazdani,Y.Medghalchi,P.Ashrafian,I.Hacihaliloglu,D.Shahriari, Flowmatchingformedicalimagesynthesis:Bridgingthegapbetween speed and quality, in: Medical Image Computing and Computer Assisted Intervention – MICCAI, 2025, pp. 216–226. Romoli et al.:Preprint submitted to ElsevierPage 20 of 32 I2I in Medical Imaging

  50. [51]

    Isensee, P

    F. Isensee, P. Jaeger, S. Kohl, J. Petersen, K. Maier-Hein, nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2021) 203–211

  51. [52]

    Di Feola, L

    F. Di Feola, L. Tronchin, P. Soda, A comparative study between paired and unpaired image quality assessment in low-dose ct denoising, in: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2023, pp. 471–476

  52. [53]

    Roberts, et al., Imaging evaluation of a proposed 3d generative model for mri to ct translation in the lumbar spine, The Spine Journal (2023)

    M. Roberts, et al., Imaging evaluation of a proposed 3d generative model for mri to ct translation in the lumbar spine, The Spine Journal (2023)

  53. [54]

    C.Tang,etal.,Incorporatingradiologistknowledgeintomriqualitymetricsformachinelearningusingrank-basedratings,JournalofMagnetic Resonance Imaging (2024)

  54. [55]

    Guarrasi, et al., Multimodal explainability via latent shift applied to covid-19 stratification, Pattern Recognition 156 (2024) 110825

    V. Guarrasi, et al., Multimodal explainability via latent shift applied to covid-19 stratification, Pattern Recognition 156 (2024) 110825

  55. [56]

    Myong, et al., Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual turing test, PLoS ONE (2023)

    Y. Myong, et al., Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual turing test, PLoS ONE (2023)

  56. [57]

    Jang, et al., Image turing test and its applications on synthetic chest radiographs by using the progressive growing generative adversarial network, Scientific Reports (2023)

    M. Jang, et al., Image turing test and its applications on synthetic chest radiographs by using the progressive growing generative adversarial network, Scientific Reports (2023)

  57. [58]

    Phelps, et al., Pairwise comparison versus likert scale for biomedical image assessment, American Journal of Roentgenology (2015)

    A. Phelps, et al., Pairwise comparison versus likert scale for biomedical image assessment, American Journal of Roentgenology (2015)

  58. [59]

    Hoeijmakers, et al., How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of likert scale, European Radiology (2024)

    E. Hoeijmakers, et al., How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of likert scale, European Radiology (2024)

  59. [60]

    Friedrich, et al., Deep learning for medical image-to-image translation: Methods, datasets, and evaluation, npj Digital Medicine 7 (2024) 114

    L. Friedrich, et al., Deep learning for medical image-to-image translation: Methods, datasets, and evaluation, npj Digital Medicine 7 (2024) 114

  60. [61]

    Breger, et al., A study of why we need to reassess full reference image quality assessment with medical images, Journal of Imaging Informatics in Medicine 38 (2025) 3444–3469

    A. Breger, et al., A study of why we need to reassess full reference image quality assessment with medical images, Journal of Imaging Informatics in Medicine 38 (2025) 3444–3469

  61. [62]

    Dohmen, M

    M. Dohmen, M. Klemens, I. Baltruschat, T. Truong, M. Lenga, Similarity and quality metrics for MR image-to-image translation, Scientific Reports 15 (2025) 3853

  62. [63]

    10684–10695

    R.Rombach,A.Blattmann,D.Lorenz,P.Esser,B.Ommer, High-resolutionimagesynthesiswithlatentdiffusionmodels, in:Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695

  63. [64]

    Haacke, R

    E. Haacke, R. Brown, M. Thompson, R. Venkatesan, Magnetic Resonance Imaging: Physical Principles and Sequence Design, Wiley-Liss, 1999

  64. [65]

    Bushberg, J

    J. Bushberg, J. Seibert, E. Leidholdt, J. Boone, The Essential Physics of Medical Imaging, 3 ed., Lippincott Williams & Wilkins, 2011

  65. [66]

    Barentsz, et al., ESUR prostate MRI guidelines, European Radiology 22 (2012) 746–757

    J. Barentsz, et al., ESUR prostate MRI guidelines, European Radiology 22 (2012) 746–757

  66. [67]

    Beets-Tan, et al., Magnetic resonance imaging for clinical management of rectal cancer, European Radiology 28 (2018) 1465–1475

    R. Beets-Tan, et al., Magnetic resonance imaging for clinical management of rectal cancer, European Radiology 28 (2018) 1465–1475

  67. [68]

    Wen, et al., Updated response assessment criteria for high-grade gliomas, Journal of Clinical Oncology 28 (2010) 1963–1972

    P. Wen, et al., Updated response assessment criteria for high-grade gliomas, Journal of Clinical Oncology 28 (2010) 1963–1972

  68. [69]

    Louis, et al., The 2021 WHO classification of tumors of the central nervous system, Neuro-Oncology 23 (2021) 1231–1251

    D. Louis, et al., The 2021 WHO classification of tumors of the central nervous system, Neuro-Oncology 23 (2021) 1231–1251

  69. [70]

    Fazekas, et al., MR signal abnormalities at 1.5 T in Alzheimer’s dementia and normal aging, American Journal of Neuroradiology 14 (1993) 1237–1242

    F. Fazekas, et al., MR signal abnormalities at 1.5 T in Alzheimer’s dementia and normal aging, American Journal of Neuroradiology 14 (1993) 1237–1242

  70. [71]

    American Cancer Society, Cancer facts & figures 2026, 2026

  71. [72]

    Stupp, et al., Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma, New England Journal of Medicine 352 (2005) 987–996

    R. Stupp, et al., Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma, New England Journal of Medicine 352 (2005) 987–996

  72. [73]

    A. A. Aizer, et al., Brain metastases: A society for neuro-oncology (SNO) consensus review on current management and future directions, Neuro-Oncology 24 (2022) 1613–1646

  73. [74]

    Singh, et al., Epidemiology of brain metastases, Neurosurgery Clinics of North America 31 (2020) 481–495

    M. Singh, et al., Epidemiology of brain metastases, Neurosurgery Clinics of North America 31 (2020) 481–495

  74. [75]

    Z.S.Mayo,etal., Radiationnecrosisortumorprogression?Areviewoftheradiographicmodalitiesusedinthediagnosisofcerebralradiation necrosis, Journal of Neuro-Oncology 161 (2023)

  75. [76]

    M.Spadea,M.Maspero,P.Zaffino,J.Seco, Deeplearningbasedsynthetic-CTgenerationinradiotherapyandPET:Areview, MedicalPhysics 48 (2021) 6537–6566

  76. [77]

    from head to toe

    S. De Pietro, et al., The role of MRI in radiotherapy planning: a narrative review “from head to toe”, Insights into Imaging 15 (2024) 255

  77. [78]

    Maspero, et al., Deep learning for CT synthesis in radiotherapy, Bioengineering 12 (2025) 1297

    M. Maspero, et al., Deep learning for CT synthesis in radiotherapy, Bioengineering 12 (2025) 1297

  78. [79]

    G.Cordier,etal., GenerativeadversarialnetworkstosynthesizemissingT1andFLAIRMRIsequencesforuseinamultisequencebraintumor segmentation model, Radiology 299 (2021) E209–E219

  79. [80]

    NationalLungScreeningTrialResearchTeam, Reducedlung-cancermortalitywithlow-dosecomputedtomographicscreening, NewEngland Journal of Medicine 365 (2011) 395–409

  80. [81]

    de Koning, et al., Reduced lung-cancer mortality with volume CT screening in a randomized trial, New England Journal of Medicine 382 (2020) 503–513

    H. de Koning, et al., Reduced lung-cancer mortality with volume CT screening in a randomized trial, New England Journal of Medicine 382 (2020) 503–513

Showing first 80 references.