arxiv: 2605.13686 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.AI

Recognition: unknown

Cross Modality Image Translation In Medical Imaging Using Generative Frameworks

Giulia Romoli , Alessia Capoccia , Filippo Ruffini , Francesco Di Feola , Luca Boldrini , Arturo Chiti , Renato Cuocolo , Tugba Akinci D'Antonoli

show 15 more authors

Fatemeh Darvizeh Marcello Di Pumpo Bradley J. Erickson Liu Fang Deborah Fazzini Paola Feraco Fabrizia Gelardi Francesco Gossetti Ana Isabel Hern\'aiz Ferrer Michail E. Klontzas Seyedmehdi Payabvash Katrine Riklund Sara N. Strandberg Valerio Guarrasi Paolo Soda

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords image-to-image translationGANmedical imaging3D synthesisoncologyCT to PETdiffusion modelsvisual Turing test

0 comments

The pith

GANs outperform latent models in standardized 3D medical image translation across 11 oncology datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a uniform testing setup for models that convert one medical scan type into another, such as CT to PET, using 3D volumes instead of 2D slices. It runs 77 experiments comparing three GAN-based methods against four latent generative approaches on scans from head, lung, and pelvis regions. GANs produce higher quality results overall, with SRGAN showing clear statistical edges, while all models have trouble with tiny lesions and PET intensity values. A test with 17 physicians finds they cannot reliably distinguish the generated images from real ones.

Core claim

Under identical preprocessing, splitting, training, and evaluation conditions, generative adversarial networks consistently exceed the performance of latent generative models in cross-modality 3D image synthesis for oncology, with SRGAN achieving statistically significant superiority; lesion-level breakdowns indicate reliable shape preservation but weaker handling of small structures and absolute uptake intensities in CT-to-PET tasks, and a visual Turing test with physicians yields 56.7 percent classification accuracy.

What carries the argument

The standardized comparative evaluation framework that enforces uniform preprocessing, data splits, inference rules, and multi-level metrics including lesion analysis and visual Turing tests across 77 experiments on 11 datasets.

If this is right

SRGAN becomes the default starting point for virtual scanning pipelines in head/neck, lung, and pelvis oncology.
All synthesis methods require targeted improvements for small-lesion fidelity and PET uptake accuracy.
Standardized 3D benchmarks replace isolated 2D task evaluations to enable fair model comparisons.
Clinical workflows can incorporate synthetic volumes once perceptual tests confirm indistinguishability from real acquisitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reducing the need for multiple physical scans could lower patient radiation dose and scan time in routine oncology follow-up.
The gap between quantitative metrics and physician preference points to a need for perceptual loss terms or clinician-in-the-loop training.
Hybrid architectures that combine adversarial training with diffusion-style stability may close the remaining performance differences on small structures.

Load-bearing premise

Uniform preprocessing, splitting, and inference rules applied to heterogeneous datasets and modalities do not inadvertently favor GAN architectures over latent models.

What would settle it

Retraining the latent models on the same eleven datasets with hyperparameters and augmentation choices tuned specifically for them, then re-running the full lesion-level and physician evaluation, would show whether they can match or exceed GAN scores.

Figures

Figures reproduced from arXiv: 2605.13686 by Alessia Capoccia, Ana Isabel Hern\'aiz Ferrer, Arturo Chiti, Bradley J. Erickson, Deborah Fazzini, Fabrizia Gelardi, Fatemeh Darvizeh, Filippo Ruffini, Francesco Di Feola, Francesco Gossetti, Giulia Romoli, Katrine Riklund, Liu Fang, Luca Boldrini, Marcello Di Pumpo, Michail E. Klontzas, Paola Feraco, Paolo Soda, Renato Cuocolo, Sara N. Strandberg, Seyedmehdi Payabvash, Tugba Akinci D'Antonoli, Valerio Guarrasi.

**Figure 1.** Figure 1: I2I translation tasks. Overview of paired I2I translation tasks selected for this study, grouped by anatomical region (lung, A; head/neck, B; and pelvis, C). Triangle vertices represent the three imaging modalities (CT, MRI, and PET). Inter-modality translations are represented by arrows between vertices, while intra-modality ones are indicated by self-loops. Arrow colors are assigned based on clinical rel… view at source ↗

**Figure 2.** Figure 2: The proposed benchmark experiments. Each of the 11 dataset configurations (left) is evaluated against all 7 generative models (centre) using 2 evaluation metrics (right), yielding 77 experimental combinations in total. selected as widely adopted for I2I translation in the medical imaging literature. Pix2Pix and CycleGAN are GAN baselines in the vast majority of medical I2I studies [3], while SRGAN is repre… view at source ↗

**Figure 3.** Figure 3: Quantitative performance. Radar charts (PSNR on the right and SSIM on the left) comparing seven I2I synthesis models across eleven task-anatomy configurations. on EnhancePET down to 0.57 on Synthrad25 MRI-to-CT (lung), whereas CycleGAN exhibits a narrower range across the same tasks (0.94 to 0.66). Latent generative models generally fall below their GAN counterparts, with the gap being most pronounced on s… view at source ↗

**Figure 4.** Figure 4: Error maps. Visual comparison across I2I translation tasks, for the two best-performing GAN-based (SRGAN and CycleGAN) and latent generative models (BBridge and FlowM). For each task, we display the target and input images (first column, first and second row respectively); the corresponding model predictions (first row); and the associated error maps with respect to the reference target (second row), compu… view at source ↗

**Figure 5.** Figure 5: Lesion analysis from BraTS23. PSNR and SSIM vs lesion size group for the MRI T2w-to-T2f task (BraTS dataset, median lesion diameter: 51.2 mm, IQR: 37.9–62.3 mm) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Lesion analysis from autoPET. PSNR and SSIM vs lesion size group for the CT-to-PET task (autoPET dataset, median lesion diameter: 19.3 mm, IQR: 15.2–30.3 mm). Figures 5 and 6 report PSNR and SSIM as a function of lesion size for the two datasets, respectively. In the BraTS23 dataset ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Summary of results from Visual Turing test, Part 1. Summary of classification performance in Visual Turing test, Part 1. Each column reports the rate of correctly (blue) and incorrectly (red) classified images, separately for real and AI-generated cases (top and bottom row, respectively). Best: physician with the highest balanced accuracy (R3). Worst (Real) and Worst (AI-gen): physicians with the lowest ac… view at source ↗

**Figure 8.** Figure 8: Results from Visual Turing test, Part 2. Pairwise preference results for GAN models (left) and latent generative models (right), for each task and as an overall aggregate ("Average") [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Results from Visual Turing test, Part 3. Three-way ranking results (Visual Turing test, Part 3). Each panel reports the percentage of rank 1 (most realistic), rank 2, and rank 3 (least realistic) assignments across all triplets, aggregated over all tasks and physicians. throughout the test. In contrast, in the T2w-to-T2f triplet, only 17.6% of readers were fooled, with the large majority correctly assignin… view at source ↗

**Figure 10.** Figure 10: Overview of the pre-processing pipeline. Each volume passes through eight sequential steps: body masking; voxel resampling; clipping; intensity normalization; spatial padding; foreground mask computation; mask intersection to obtain the common anatomical region of interest; and patch extraction. Cropping was performed exclusively along the axial axis: the inferior and superior extents of the lung mask wer… view at source ↗

**Figure 11.** Figure 11: Overview of the proposed benchmarking framework. The pipeline consists of four stages: (1) a configuration module, where the user specifies data, model, and training parameters and the dataset is split into training and test sets (75%–25%); (2) a data pipeline, which applies a sequence of preprocessing steps to produce paired source–target volumes; (3) a training pipeline, where GAN-based models operate i… view at source ↗

**Figure 12.** Figure 12: The Visual Turing test platform. Each volume was displayed through a multi-planar viewer rendered by a grid layout providing three orthogonal anatomical planes (axial, sagittal, and coronal) alongside a 3D surface reconstruction. In Part 1, a single volume was displayed and participants were asked to classify the image as either Real or AI-generated using two mutually exclusive buttons positioned below th… view at source ↗

read the original abstract

Medical image-to-image (I2I) translation enables virtual scanning, i.e. the synthesis of a target imaging modality from a source one without additional acquisitions. Despite growing interest, most proposed methods operate on 2D slices, are evaluated on isolated tasks with different experimental set-ups and lack clinical validation. The primary contribution of this work is a reproducible, standardized comparative evaluation of 3D I2I translation methods in oncological imaging, designed to standardize preprocessing, splitting, inference, and multi-level evaluation across heterogeneous clinical tasks. Within this framework, we compare seven generative models, three Generative Adversarial Networks (GANs: Pix2Pix, CycleGAN, SRGAN) and four latent generative models (Latent Diffusion Model, Latent Diffusion Model+ControlNet, Brownian Bridge, Flow Matching), across eleven datasets spanning three anatomical regions (head/neck, lung, pelvis) and four translation directions (cone-beam CT to CT, MRI to CT, CT to PET, MRI T2-weighted to T2-FLAIR), for a total of 77 experiments under uniform training, inference, and evaluation conditions. The results show that GANs outperform latent generative models across all tasks, with SRGAN achieving statistically significant superiority. Our lesion-level analysis reveals that all models struggle with small lesions and that, in CT to PET synthesis, models reproduce lesion shape more reliably than absolute uptake-related intensity. We also performed a Visual Turing test administered to 17 physicians, including 15 radiologists, which shows near-chance classification accuracy (56.7%), confirming that synthetic volumes are largely indistinguishable from real acquisitions, while exposing a dissociation between quantitative metrics and clinical preference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper delivers a useful large-scale benchmark showing GANs beating latent models under one fixed 3D protocol, with decent clinical validation, but the uniform setup likely tilts results toward GAN strengths.

read the letter

The main thing to know is that under identical preprocessing, splits, and evaluation across 77 experiments on eleven datasets, GANs (especially SRGAN) outperform latent models like LDM and Flow Matching in 3D medical image translation, and a 17-physician Turing test lands at near-chance accuracy of 56.7 percent. The lesion-level breakdowns add value by showing consistent trouble with small lesions and better shape preservation than intensity matching in CT-to-PET cases. That controlled scale is the real contribution here, since most prior work scatters across single tasks with mismatched setups. The uniform protocol makes the ranking more believable than isolated studies, and the physician test provides a practical check beyond metrics. The soft spot is that one shared training recipe may not be neutral. GANs often converge quickly with standard medical normalization and short schedules, while latent diffusion and flow models frequently need longer runs or adjusted noise schedules to reach their potential. The paper does not test per-model tuning, so the reported gap could shrink under more tailored conditions. No code release is mentioned, which caps how easily others can verify or extend the results. This is worth a serious referee for groups working on medical synthesis benchmarks or virtual scanning applications, because the experimental controls and multi-level evaluation give it more grounding than typical model papers. I would bring it to a reading group to discuss the protocol choices and whether the GAN edge holds up under different training rules.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a standardized comparative evaluation of seven 3D generative models for cross-modality image-to-image translation in oncological imaging. It compares three GANs (Pix2Pix, CycleGAN, SRGAN) and four latent models (LDM, LDM+ControlNet, Brownian Bridge, Flow Matching) across eleven datasets spanning head/neck, lung, and pelvis regions and four translation directions, for a total of 77 experiments under uniform preprocessing, splitting, training, and inference protocols. Results indicate GANs outperform latent models with SRGAN achieving statistically significant superiority; lesion-level analysis shows struggles with small lesions and better shape than intensity reproduction in CT-to-PET; a visual Turing test with 17 physicians yields 56.7% accuracy, indicating synthetic volumes are largely indistinguishable from real acquisitions.

Significance. If the results hold, this work delivers a reproducible benchmark for 3D medical I2I translation by enforcing consistent experimental conditions across heterogeneous tasks and modalities. The scale (77 experiments), inclusion of statistical tests, lesion-specific breakdowns, and physician visual Turing test provide concrete empirical grounding and clinical relevance that could inform model selection and highlight persistent challenges such as small-lesion fidelity and PET uptake accuracy.

major comments (1)

[Experimental Setup] Experimental Setup: The central claim that GANs (particularly SRGAN) outperform latent generative models rests on a single shared preprocessing, splitting, and training recipe applied uniformly to all models. While this protocol enables direct comparability, it may systematically favor GAN architectures, which often converge reliably under standard medical intensity normalization and short schedules, whereas latent diffusion and flow models frequently require longer training, modality-specific noise schedules, or augmentations. The manuscript should explicitly discuss whether per-model hyperparameter optimization was considered and, if not, justify why the uniform protocol is the appropriate basis for ranking intrinsic capabilities rather than protocol compatibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our work. We address the single major comment below and will revise the manuscript accordingly to strengthen the discussion of our experimental design.

read point-by-point responses

Referee: [Experimental Setup] Experimental Setup: The central claim that GANs (particularly SRGAN) outperform latent generative models rests on a single shared preprocessing, splitting, and training recipe applied uniformly to all models. While this protocol enables direct comparability, it may systematically favor GAN architectures, which often converge reliably under standard medical intensity normalization and short schedules, whereas latent diffusion and flow models frequently require longer training, modality-specific noise schedules, or augmentations. The manuscript should explicitly discuss whether per-model hyperparameter optimization was considered and, if not, justify why the uniform protocol is the appropriate basis for ranking intrinsic capabilities rather than protocol compatibility.

Authors: We appreciate the referee's observation on this key design choice. The uniform protocol was intentionally selected as the core of our contribution: to deliver a reproducible benchmark that enables direct, apples-to-apples comparison of the seven models under identical preprocessing, splitting, training schedules, and inference conditions across 77 experiments. Per-model hyperparameter optimization was deliberately not performed, because doing so would have broken the standardization that allows us to attribute performance differences to the architectures themselves rather than to unequal tuning effort. This setup mirrors a realistic clinical or research scenario in which practitioners apply a single, practical recipe across heterogeneous models. We fully acknowledge that the reported rankings reflect performance under this shared protocol and may not represent the absolute best achievable results for each model with extensive, architecture-specific tuning (e.g., longer diffusion schedules or modality-specific augmentations). We will revise the manuscript to add an explicit paragraph in the Experimental Setup and a dedicated limitations subsection that states this caveat and justifies the uniform protocol as the appropriate basis for ranking relative capabilities under consistent, reproducible conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison under fixed protocols

full rationale

The paper conducts a standardized empirical evaluation of seven existing generative models (Pix2Pix, CycleGAN, SRGAN, LDM, LDM+ControlNet, Brownian Bridge, Flow Matching) across 77 experiments on eleven datasets. No derivations, equations, or predictions are claimed that reduce reported metrics to fitted parameters or self-defined quantities by construction. Performance numbers arise from direct inference on held-out splits using uniform preprocessing and evaluation rules; statistical significance is computed from these independent runs. Any self-citations refer only to the original model papers and do not load-bear the comparative claims. The work is self-contained against external benchmarks and exhibits no self-definitional, fitted-input, or uniqueness-imported circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The comparison rests on standard deep-learning training assumptions and the representativeness of the selected clinical datasets; no new entities or ad-hoc constants are introduced by the paper itself.

axioms (1)

domain assumption Standard assumptions in supervised and unsupervised training of generative models hold under the uniform protocol
Invoked when claiming model superiority from training under identical conditions

pith-pipeline@v0.9.0 · 5718 in / 1282 out tokens · 42024 ms · 2026-05-14T20:14:32.721527+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · 2 internal anchors

[1]

WHO, WHO compendium of innovative health technologies for low-resource settings 2024, World Health Organization WHO, 2024

work page 2024
[2]

Kjelle, et al., Cost of low-value imaging worldwide: a systematic review, Applied health economics and health policy 22 (2024) 485

E. Kjelle, et al., Cost of low-value imaging worldwide: a systematic review, Applied health economics and health policy 22 (2024) 485

work page 2024
[3]

Dayarathna, et al., Deep learning-based synthesis of MRI, CT and PET: Review and analysis, Computer Methods and Programs in Biomedicine 257 (2024) 108173

S. Dayarathna, et al., Deep learning-based synthesis of MRI, CT and PET: Review and analysis, Computer Methods and Programs in Biomedicine 257 (2024) 108173

work page 2024
[4]

Doan, et al., Bridging modalities with ai: a review of ai advances in multimodal biomedical imaging, Communications Engineering 5 (2026) 30

L. Doan, et al., Bridging modalities with ai: a review of ai advances in multimodal biomedical imaging, Communications Engineering 5 (2026) 30

work page 2026
[5]

Sherwani, S

M. Sherwani, S. Gopalakrishnan, A systematic literature review: deep learning techniques for synthetic medical image generation and their applications in radiotherapy, Frontiers in Radiology 4 (2024) 1385742

work page 2024
[6]

X.Fu,etal., Asystematicreviewofgenerativeartificialintelligencetechniquesforsyntheticmedicalimagedatasets:Quality,models,public availability and applications, Computer Methods and Programs in Biomedicine (2026) 109331

work page 2026
[7]

Rofena, et al., Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence, Journal of Biomedical Informatics (2025) 104971

A. Rofena, et al., Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence, Journal of Biomedical Informatics (2025) 104971

work page 2025
[8]

Kazeminia, et al., GANs for medical image analysis, Artificial Intelligence in Medicine 109 (2020) 101938

S. Kazeminia, et al., GANs for medical image analysis, Artificial Intelligence in Medicine 109 (2020) 101938

work page 2020
[10]

Bredell, et al., Explicitly minimizing the blur error of variational autoencoders, in: The Eleventh International Conference on Learning Representations, 2023, pp

G. Bredell, et al., Explicitly minimizing the blur error of variational autoencoders, in: The Eleventh International Conference on Learning Representations, 2023, pp. 1–16

work page 2023
[11]

1125–1134

P.Isola,etal., Image-to-imagetranslationwithconditionaladversarialnetworks, in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, 2017, pp. 1125–1134

work page 2017
[12]

Zhu, et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp

J. Zhu, et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232

work page 2017
[13]

Nie, et al., Medical image synthesis with deep convolutional adversarial networks, IEEE Transactions on Biomedical Engineering 65 (2018) 2720–2730

D. Nie, et al., Medical image synthesis with deep convolutional adversarial networks, IEEE Transactions on Biomedical Engineering 65 (2018) 2720–2730

work page 2018
[14]

Wolterink, et al., Deep MR to CT synthesis using unpaired data, in: International Workshop on Simulation and Synthesis in Medical Imaging, Springer, 2017, pp

J. Wolterink, et al., Deep MR to CT synthesis using unpaired data, in: International Workshop on Simulation and Synthesis in Medical Imaging, Springer, 2017, pp. 14–23

work page 2017
[15]

Y. Liu, et al., Magnetic resonance image synthesis from brain computed tomography images based on deep learning methods for magnetic resonance-guided radiotherapy, Quantitative Imaging in Medicine and Surgery 10 (2020) 1358

work page 2020
[16]

Romoli et al.:Preprint submitted to ElsevierPage 19 of 32 I2I in Medical Imaging

S.Dar,etal., Imagesynthesisinmulti-contrastMRIwithconditionalgenerativeadversarialnetworks, IEEETransactionsonMedicalImaging 38 (2019) 2375–2388. Romoli et al.:Preprint submitted to ElsevierPage 19 of 32 I2I in Medical Imaging

work page 2019
[17]

A.Chartsias,etal., MultimodalMRsynthesisviamodality-invariantlatentrepresentation, IEEETransactionsonMedicalImaging37(2018) 803–814

work page 2018
[18]

Yu, et al., Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis, IEEE Transactions on Medical Imaging 38 (2019) 1750–1762

B. Yu, et al., Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis, IEEE Transactions on Medical Imaging 38 (2019) 1750–1762

work page 2019
[19]

Poonkodi, M

S. Poonkodi, M. Kanchana, 3D-MedTranCSGAN: 3D medical image transformation using CSGAN, Computers in Biology and Medicine 153 (2023) 106541

work page 2023
[20]

V.Guarrasi,etal., Whole-bodyimage-to-imagetranslationforavirtualscannerinahealthcaredigitaltwin, in:ProceedingsoftheIEEE38th International Symposium on Computer-Based Medical Systems (CBMS), 2025, pp. 528–534

work page 2025
[21]

4342–4351

J.Ha,etal., Multi-resolutionguided3DGANsformedicalimagetranslation, in:IEEE/CVFWinterConferenceonApplicationsofComputer Vision (WACV), 2025, pp. 4342–4351

work page 2025
[22]

J.Ho,A.Jain,P.Abbeel, Denoisingdiffusionprobabilisticmodels, Advancesinneuralinformationprocessingsystems33(2020)6840–6851

work page 2020
[23]

8780–8794

P.Dhariwal,A.Nichol, DiffusionmodelsbeatGANsonimagesynthesis, in:AdvancesinNeuralInformationProcessingSystems,volume34, 2021, pp. 8780–8794

work page 2021
[24]

Kazerouni, et al., Diffusion models in medical imaging: A comprehensive survey, Medical Image Analysis 88 (2023) 102846

A. Kazerouni, et al., Diffusion models in medical imaging: A comprehensive survey, Medical Image Analysis 88 (2023) 102846

work page 2023
[25]

A. Moschetto, et al., Benchmarking gans, diffusion models, and flow matching for t1w-to-t2w mri translation, in: International Conference on Image Analysis and Processing, Springer, 2025, pp. 429–440

work page 2025
[26]

Q.Bertrand,A.Gagneux,M.Massias,R.Emonet,Ontheclosed-formofflowmatching:Generalizationdoesnotarisefromtargetstochasticity, arXiv preprint arXiv:2506.03719 (2025)

work page arXiv 2025
[27]

Akbar, W

M. Akbar, W. Wang, A. Eklund, Beware of diffusion models for synthesizing medical images – a comparison with GANs in terms of memorizing brain MRI and chest x-ray images, Machine Learning: Science and Technology 6 (2025) 015022

work page 2025
[28]

Pan, et al., Synthetic CT generation from MRI using 3D transformer-based denoising diffusion model, Medical Physics 51 (2024) 2538– 2548

S. Pan, et al., Synthetic CT generation from MRI using 3D transformer-based denoising diffusion model, Medical Physics 51 (2024) 2538– 2548

work page 2024
[29]

K. Choo, Y. Jun, M. Yun, S. Hwang, Slice-consistent 3D volumetric brain CT-to-MRI translation with 2D Brownian bridge diffusion model, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, Springer, 2024, pp. 657–667

work page 2024
[30]

X.Zhu,etal., Introducing3Drepresentationfordensevolume-to-volumetranslationviascorefusion, in:InternationalConferenceonMachine Learning, 2025, pp. 1–22

work page 2025
[31]

J. Kim, H. Park, Adaptive latent diffusion model for 3D medical image to image translation: Multi-modal magnetic resonance imaging study, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7604–7613

work page 2024
[32]

12873–12883

P.Esser,R.Rombach,B.Ommer, Tamingtransformersforhigh-resolutionimagesynthesis, in:ProceedingsoftheIEEE/CVFConferenceon Computer Vision and Pattern Recognition, 2021, pp. 12873–12883

work page 2021
[33]

8778–8786

A.Sargood,etal., CoCoLIT:ControlNet-conditionedlatentimagetranslationforMRItoamyloidPETsynthesis, in:ProceedingsoftheAAAI Conference on Artificial Intelligence, volume 40, 2026, pp. 8778–8786

work page 2026
[34]

Zhang, A

L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image diffusion models, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023, pp. 3836–3847

work page 2023
[35]

Graf, et al., Denoising diffusion-based MRI to CT image translation enables automated spinal segmentation, European Radiology Experimental 7 (2023) 70

R. Graf, et al., Denoising diffusion-based MRI to CT image translation enables automated spinal segmentation, European Radiology Experimental 7 (2023) 70

work page 2023
[36]

Rajagopal, et al., Synthetic PET via domain translation of 3-D MRI, IEEE Transactions on Radiation and Plasma Medical Sciences 7 (2023) 333–343

A. Rajagopal, et al., Synthetic PET via domain translation of 3-D MRI, IEEE Transactions on Radiation and Plasma Medical Sciences 7 (2023) 333–343

work page 2023
[37]

M. Bahloul, et al., Advancements in synthetic CT generation from MRI: A review of techniques and trends in radiation therapy planning, Journal of Applied Clinical Medical Physics (2024)

work page 2024
[38]

Thummerer, et al., Synthrad2025 grand challenge dataset: Generating synthetic cts for radiotherapy from head to abdomen, Medical Physics 52 (2025) e17981

A. Thummerer, et al., Synthrad2025 grand challenge dataset: Generating synthetic cts for radiotherapy from head to abdomen, Medical Physics 52 (2025) e17981

work page 2025
[39]

F. Bray, et al., Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: A Cancer Journal for Clinicians 74 (2024) 229–263

work page 2022
[40]

Siegel, et al., Cancer statistics, 2026, CA: A Cancer Journal for Clinicians 76 (2026)

R. Siegel, et al., Cancer statistics, 2026, CA: A Cancer Journal for Clinicians 76 (2026)

work page 2026
[41]

Karimi, et al., Glioblastoma: Clinical presentation, multidisciplinary management, and long-term outcomes, Cancers 17 (2025)

S. Karimi, et al., Glioblastoma: Clinical presentation, multidisciplinary management, and long-term outcomes, Cancers 17 (2025)

work page 2025
[42]

A. Thummerer, et al., SynthRAD2023 grand challenge dataset: Generating synthetic CT for radiotherapy, in: Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI) Challenges, 2023, pp. 4664–4674

work page 2023
[43]

Kazerooni, et al., The ASNR-MICCAI brain tumor segmentation (BraTS) challenge 2023: Intracranial meningioma, in: Proceedings of MICCAI, 2023, pp

A. Kazerooni, et al., The ASNR-MICCAI brain tumor segmentation (BraTS) challenge 2023: Intracranial meningioma, in: Proceedings of MICCAI, 2023, pp. 1–11

work page 2023
[44]

Gatidis, et al., A whole-body FDG-PET/CT dataset with manually annotated tumor lesions, Scientific Data 9 (2022) 601

S. Gatidis, et al., A whole-body FDG-PET/CT dataset with manually annotated tumor lesions, Scientific Data 9 (2022) 601

work page 2022
[45]

Ferrara, et al., Sharing a whole-/total-body [18f] fdg-pet/ct dataset with ct-derived segmentations: an enhance

D. Ferrara, et al., Sharing a whole-/total-body [18f] fdg-pet/ct dataset with ct-derived segmentations: an enhance. pet initiative, Scientific Data (2026)

work page 2026
[46]

Saharia, et al., Palette: Image-to-image diffusion models, in: ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp

C. Saharia, et al., Palette: Image-to-image diffusion models, in: ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10

work page 2022
[47]

B. Li, K. Xue, B. Liu, Y. Lai, BBDM: Image-to-image translation with Brownian bridge diffusion models, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1952–1961

work page 2023
[48]

Y.Lipman,R.Chen,H.Ben-Hamu,M.Nickel,M.Le, Flowmatchingforgenerativemodeling, in:ProceedingsoftheInternationalConference on Learning Representations (ICLR), 2023, pp. 1–28

work page 2023
[49]

Valls, P

M. Valls, P. Bourdon, C. Fernandez, G. Herpe, D. Helbert, Prob-bbdm: A probabilistic brownian bridge diffusion model for mri sequence image-to-image translation, Computerized Medical Imaging and Graphics (2026) 102745

work page 2026
[50]

M.Yazdani,Y.Medghalchi,P.Ashrafian,I.Hacihaliloglu,D.Shahriari, Flowmatchingformedicalimagesynthesis:Bridgingthegapbetween speed and quality, in: Medical Image Computing and Computer Assisted Intervention – MICCAI, 2025, pp. 216–226. Romoli et al.:Preprint submitted to ElsevierPage 20 of 32 I2I in Medical Imaging

work page 2025
[51]

Isensee, P

F. Isensee, P. Jaeger, S. Kohl, J. Petersen, K. Maier-Hein, nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2021) 203–211

work page 2021
[52]

Di Feola, L

F. Di Feola, L. Tronchin, P. Soda, A comparative study between paired and unpaired image quality assessment in low-dose ct denoising, in: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2023, pp. 471–476

work page 2023
[53]

Roberts, et al., Imaging evaluation of a proposed 3d generative model for mri to ct translation in the lumbar spine, The Spine Journal (2023)

M. Roberts, et al., Imaging evaluation of a proposed 3d generative model for mri to ct translation in the lumbar spine, The Spine Journal (2023)

work page 2023
[54]

C.Tang,etal.,Incorporatingradiologistknowledgeintomriqualitymetricsformachinelearningusingrank-basedratings,JournalofMagnetic Resonance Imaging (2024)

work page 2024
[55]

Guarrasi, et al., Multimodal explainability via latent shift applied to covid-19 stratification, Pattern Recognition 156 (2024) 110825

V. Guarrasi, et al., Multimodal explainability via latent shift applied to covid-19 stratification, Pattern Recognition 156 (2024) 110825

work page 2024
[56]

Myong, et al., Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual turing test, PLoS ONE (2023)

Y. Myong, et al., Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual turing test, PLoS ONE (2023)

work page 2023
[57]

Jang, et al., Image turing test and its applications on synthetic chest radiographs by using the progressive growing generative adversarial network, Scientific Reports (2023)

M. Jang, et al., Image turing test and its applications on synthetic chest radiographs by using the progressive growing generative adversarial network, Scientific Reports (2023)

work page 2023
[58]

Phelps, et al., Pairwise comparison versus likert scale for biomedical image assessment, American Journal of Roentgenology (2015)

A. Phelps, et al., Pairwise comparison versus likert scale for biomedical image assessment, American Journal of Roentgenology (2015)

work page 2015
[59]

Hoeijmakers, et al., How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of likert scale, European Radiology (2024)

E. Hoeijmakers, et al., How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of likert scale, European Radiology (2024)

work page 2024
[60]

Friedrich, et al., Deep learning for medical image-to-image translation: Methods, datasets, and evaluation, npj Digital Medicine 7 (2024) 114

L. Friedrich, et al., Deep learning for medical image-to-image translation: Methods, datasets, and evaluation, npj Digital Medicine 7 (2024) 114

work page 2024
[61]

Breger, et al., A study of why we need to reassess full reference image quality assessment with medical images, Journal of Imaging Informatics in Medicine 38 (2025) 3444–3469

A. Breger, et al., A study of why we need to reassess full reference image quality assessment with medical images, Journal of Imaging Informatics in Medicine 38 (2025) 3444–3469

work page 2025
[62]

Dohmen, M

M. Dohmen, M. Klemens, I. Baltruschat, T. Truong, M. Lenga, Similarity and quality metrics for MR image-to-image translation, Scientific Reports 15 (2025) 3853

work page 2025
[63]

10684–10695

R.Rombach,A.Blattmann,D.Lorenz,P.Esser,B.Ommer, High-resolutionimagesynthesiswithlatentdiffusionmodels, in:Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695

work page 2022
[64]

Haacke, R

E. Haacke, R. Brown, M. Thompson, R. Venkatesan, Magnetic Resonance Imaging: Physical Principles and Sequence Design, Wiley-Liss, 1999

work page 1999
[65]

Bushberg, J

J. Bushberg, J. Seibert, E. Leidholdt, J. Boone, The Essential Physics of Medical Imaging, 3 ed., Lippincott Williams & Wilkins, 2011

work page 2011
[66]

Barentsz, et al., ESUR prostate MRI guidelines, European Radiology 22 (2012) 746–757

J. Barentsz, et al., ESUR prostate MRI guidelines, European Radiology 22 (2012) 746–757

work page 2012
[67]

Beets-Tan, et al., Magnetic resonance imaging for clinical management of rectal cancer, European Radiology 28 (2018) 1465–1475

R. Beets-Tan, et al., Magnetic resonance imaging for clinical management of rectal cancer, European Radiology 28 (2018) 1465–1475

work page 2018
[68]

Wen, et al., Updated response assessment criteria for high-grade gliomas, Journal of Clinical Oncology 28 (2010) 1963–1972

P. Wen, et al., Updated response assessment criteria for high-grade gliomas, Journal of Clinical Oncology 28 (2010) 1963–1972

work page 2010
[69]

Louis, et al., The 2021 WHO classification of tumors of the central nervous system, Neuro-Oncology 23 (2021) 1231–1251

D. Louis, et al., The 2021 WHO classification of tumors of the central nervous system, Neuro-Oncology 23 (2021) 1231–1251

work page 2021
[70]

Fazekas, et al., MR signal abnormalities at 1.5 T in Alzheimer’s dementia and normal aging, American Journal of Neuroradiology 14 (1993) 1237–1242

F. Fazekas, et al., MR signal abnormalities at 1.5 T in Alzheimer’s dementia and normal aging, American Journal of Neuroradiology 14 (1993) 1237–1242

work page 1993
[71]

American Cancer Society, Cancer facts & figures 2026, 2026

work page 2026
[72]

Stupp, et al., Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma, New England Journal of Medicine 352 (2005) 987–996

R. Stupp, et al., Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma, New England Journal of Medicine 352 (2005) 987–996

work page 2005
[73]

A. A. Aizer, et al., Brain metastases: A society for neuro-oncology (SNO) consensus review on current management and future directions, Neuro-Oncology 24 (2022) 1613–1646

work page 2022
[74]

Singh, et al., Epidemiology of brain metastases, Neurosurgery Clinics of North America 31 (2020) 481–495

M. Singh, et al., Epidemiology of brain metastases, Neurosurgery Clinics of North America 31 (2020) 481–495

work page 2020
[75]

Z.S.Mayo,etal., Radiationnecrosisortumorprogression?Areviewoftheradiographicmodalitiesusedinthediagnosisofcerebralradiation necrosis, Journal of Neuro-Oncology 161 (2023)

work page 2023
[76]

M.Spadea,M.Maspero,P.Zaffino,J.Seco, Deeplearningbasedsynthetic-CTgenerationinradiotherapyandPET:Areview, MedicalPhysics 48 (2021) 6537–6566

work page 2021
[77]

from head to toe

S. De Pietro, et al., The role of MRI in radiotherapy planning: a narrative review “from head to toe”, Insights into Imaging 15 (2024) 255

work page 2024
[78]

Maspero, et al., Deep learning for CT synthesis in radiotherapy, Bioengineering 12 (2025) 1297

M. Maspero, et al., Deep learning for CT synthesis in radiotherapy, Bioengineering 12 (2025) 1297

work page 2025
[79]

G.Cordier,etal., GenerativeadversarialnetworkstosynthesizemissingT1andFLAIRMRIsequencesforuseinamultisequencebraintumor segmentation model, Radiology 299 (2021) E209–E219

work page 2021
[80]

NationalLungScreeningTrialResearchTeam, Reducedlung-cancermortalitywithlow-dosecomputedtomographicscreening, NewEngland Journal of Medicine 365 (2011) 395–409

work page 2011
[81]

de Koning, et al., Reduced lung-cancer mortality with volume CT screening in a randomized trial, New England Journal of Medicine 382 (2020) 503–513

H. de Koning, et al., Reduced lung-cancer mortality with volume CT screening in a randomized trial, New England Journal of Medicine 382 (2020) 503–513

work page 2020

Showing first 80 references.