Fully Guided Neural Schr\"odinger bridge for Brain MR image synthesis

Hanyeol Yang; Hyeon Chang Kim; Insung Chung; Jong-Min Lee; Mi Kyung Kim; Min-Ho Shin; Sang Baek Koh; Sunggyu Kim; Yongseon Yoo; Yu-Mi Kim

arxiv: 2501.14171 · v3 · submitted 2025-01-24 · 📡 eess.IV · cs.CV

Fully Guided Neural Schr\"odinger bridge for Brain MR image synthesis

Hanyeol Yang , Sunggyu Kim , Mi Kyung Kim , Yongseon Yoo , Yu-Mi Kim , Min-Ho Shin , Insung Chung , Sang Baek Koh

show 2 more authors

Hyeon Chang Kim Jong-Min Lee

This is my paper

Pith reviewed 2026-05-23 05:40 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords MRI modality synthesisSchrödinger bridgelimited paired datalesion preservationbrain MRIimage generationmulti-modal imaging

0 comments

The pith

A neural Schrödinger bridge generates missing brain MRI modalities from extremely limited paired data while preserving lesions via supplied priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fully Guided Schrödinger Bridge (FGSB) to synthesize missing MRI modalities without needing large paired training sets. It does this through a generation stage that refines images starting from source scans plus noise, and a training stage that learns consistent transformation paths by modeling intermediate states. A reader would care because acquiring all MRI modalities is often impractical due to time and cost, yet having them improves diagnosis. When lesion masks or annotations are available, the method keeps those critical features intact rather than smoothing them away. Tests on multiple datasets show it maintains performance across varying image resolutions and scanner environments.

Core claim

FGSB overcomes the trade-off between paired methods that require impractical amounts of aligned data and unpaired methods that lose anatomical details like lesions by using a two-stage process: iterative refinement of synthetic images from paired sources and Gaussian noise in generation, and learning optimal pathways by modeling intermediate states in training, enabling high-fidelity outputs even with scarce pairs and lesion preservation via priors.

What carries the argument

Fully Guided Schrödinger Bridge (FGSB), a framework that models transformation pathways between modalities by iteratively refining images from paired data and noise while learning from intermediate states to ensure fidelity.

If this is right

High-fidelity synthesis becomes feasible with extremely limited paired data.
Lesion-specific priors from annotations or segmentation masks enhance preservation of clinically relevant features.
Reliable performance holds across diverse imaging resolutions and data acquisition environments.
The method bridges accuracy of paired approaches with the scalability of unpaired ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-stage refinement process could be adapted to other medical image modalities where paired examples are scarce.
Performance with lesion priors suggests similar guidance mechanisms might improve fidelity in related image translation tasks.
If the intermediate-state modeling proves robust, it may lower the data threshold needed for clinical deployment of synthesis tools.

Load-bearing premise

The assumption that iteratively refining images from paired source data plus Gaussian noise and learning transformation pathways via intermediate states will produce high-fidelity, lesion-preserving outputs even when paired training data is extremely limited.

What would settle it

Train FGSB on a dataset with only a handful of paired scans from one scanner type then measure whether synthesis quality and lesion overlap on a test set from a different resolution or acquisition environment falls below standard unpaired baselines.

Figures

Figures reproduced from arXiv: 2501.14171 by Hanyeol Yang, Hyeon Chang Kim, Insung Chung, Jong-Min Lee, Mi Kyung Kim, Min-Ho Shin, Sang Baek Koh, Sunggyu Kim, Yongseon Yoo, Yu-Mi Kim.

**Figure 2.** Figure 2: An overview of the proposed framework for medical image translation, which consists of two [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the Self-supervised discriminator, which consists of downsampled target image, [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison between the proposed FGSB and other methods. Our approach was trained [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison between the proposed FGSB and other methods. Our approach was trained [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study for our variants model. The left graph shows results from training/ [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison between the proposed FGSB and other methods. Our approach was trained [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

read the original abstract

Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities in practice is often constrained by time and cost. To address this, various methods have been proposed to generate missing modalities from available ones. Existing approaches can be broadly categorized into two types: paired and unpaired methods. While paired methods achieve high synthesis accuracy, obtaining large-scale paired datasets is typically impractical. In contrast, unpaired methods, though more scalable, often fail to preserve critical anatomical features, such as lesions. In this paper, we propose Fully Guided Schr\"odinger Bridge (FGSB), a novel framework designed to overcome these limitations by enabling high-fidelity generation with extremely limited paired data. When lesion-specific information, such as expert annotations or segmentation masks, is available, FGSB preserves clinically relevant lesions during missing modality synthesis. Our model comprises two stages: (1) a generation stage that iteratively refines synthetic images using paired source images and Gaussian noise, and (2) a training stage that learns optimal transformation pathways by modeling intermediate states to ensure consistent, high-fidelity synthesis. Experimental results across multiple datasets demonstrate that FGSB achieves reliable synthesis performance across diverse imaging resolutions and data acquisition environments. In addition, incorporating lesion-specific priors further enhances the preservation of clinically relevant features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Fully Guided Schrödinger Bridge (FGSB), a two-stage framework for synthesizing missing brain MR modalities from available ones. The generation stage iteratively refines synthetic images starting from paired source images plus Gaussian noise; the training stage learns optimal transformation pathways by modeling intermediate states. When lesion annotations or segmentation masks are available, these are incorporated as priors to preserve clinically relevant features. The authors claim that this enables high-fidelity, lesion-preserving synthesis even with extremely limited paired data and demonstrate reliable performance across multiple datasets with varying resolutions and acquisition environments.

Significance. If the central claim holds, the work would be significant for clinical multi-modal MRI applications, where acquiring large paired datasets is often impractical due to time and cost constraints. A method that reliably bridges the gap between paired accuracy and unpaired scalability while preserving lesions could improve diagnostic workflows. The use of a fully guided neural Schrödinger bridge with explicit intermediate-state modeling offers a principled way to handle limited supervision, and the optional lesion-prior integration directly addresses a known failure mode of existing unpaired approaches.

major comments (2)

[Abstract] Abstract: The central claim that FGSB achieves 'high-fidelity generation with extremely limited paired data' and 'reliable synthesis performance' is not accompanied by any reported number of paired samples, ablation curves versus pair cardinality, baseline comparisons, or quantitative metrics (e.g., PSNR, SSIM, or lesion Dice scores). Without these, it is impossible to verify whether the two-stage process (generation from paired source + noise and training via intermediate states) actually succeeds when paired data is scarce enough to be 'extremely limited,' or whether the drift/diffusion estimation reduces to the noise prior.
[Abstract] Abstract (and presumably §4 or §5): The assertion that 'incorporating lesion-specific priors further enhances the preservation of clinically relevant features' lacks any quantitative lesion-specific evaluation or comparison against the same model without the priors. This is load-bearing for the clinical utility claim, as visual preservation alone does not establish that the priors measurably improve fidelity on lesions versus background anatomy.

minor comments (1)

[Abstract] The abstract refers to 'multiple datasets' and 'diverse imaging resolutions' but does not name the datasets or resolutions; this should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the quantitative support for our claims in the abstract and main text.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that FGSB achieves 'high-fidelity generation with extremely limited paired data' and 'reliable synthesis performance' is not accompanied by any reported number of paired samples, ablation curves versus pair cardinality, baseline comparisons, or quantitative metrics (e.g., PSNR, SSIM, or lesion Dice scores). Without these, it is impossible to verify whether the two-stage process (generation from paired source + noise and training via intermediate states) actually succeeds when paired data is scarce enough to be 'extremely limited,' or whether the drift/diffusion estimation reduces to the noise prior.

Authors: We agree that the abstract would be strengthened by explicit quantitative details. The full manuscript reports experiments across multiple datasets with varying resolutions, but the abstract itself does not cite specific pair counts or metrics. In revision we will update the abstract to state the number of paired samples used (typically 10-50 pairs depending on the dataset), report average PSNR/SSIM values, and add a sentence referencing the ablation studies in §4 that plot performance versus pair cardinality. These ablations demonstrate consistent gains over baselines and confirm that the learned drift does not collapse to the noise prior, as the two-stage intermediate-state modeling yields measurable improvements in fidelity. revision: yes
Referee: [Abstract] Abstract (and presumably §4 or §5): The assertion that 'incorporating lesion-specific priors further enhances the preservation of clinically relevant features' lacks any quantitative lesion-specific evaluation or comparison against the same model without the priors. This is load-bearing for the clinical utility claim, as visual preservation alone does not establish that the priors measurably improve fidelity on lesions versus background anatomy.

Authors: We concur that quantitative lesion-specific metrics are necessary to substantiate the clinical claim. The current manuscript provides qualitative examples of lesion preservation when priors are available but does not include lesion Dice scores or an explicit ablation comparing the model with versus without priors. In the revised manuscript we will add these evaluations in §5, reporting lesion Dice coefficients on held-out annotations and the corresponding improvement when priors are incorporated, thereby providing the missing quantitative evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new framework with independent experimental validation.

full rationale

The paper introduces FGSB as a two-stage neural Schrödinger bridge (generation via iterative refinement from paired source + noise; training via intermediate-state pathway learning) for limited-pair MRI synthesis. No quoted equations or steps reduce a claimed prediction to a fitted input by construction, nor does any load-bearing premise rest on self-citation chains. Claims of reliable performance rest on cross-dataset experiments rather than definitional equivalence. This matches the provided reader's assessment of minimal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that Schrödinger bridge dynamics can be learned from limited paired image data to produce consistent transformations, plus the practical assumption that lesion annotations are available and reliable when needed.

axioms (1)

domain assumption Schrödinger bridge models can learn optimal transformation pathways between source and target image distributions by modeling intermediate states
Invoked in the description of the training stage that ensures consistent high-fidelity synthesis.

pith-pipeline@v0.9.0 · 5794 in / 1270 out tokens · 32737 ms · 2026-05-23T05:40:17.505205+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Zhou, Feature fusion and latent feature learning guided brain tumor segmen- tation and missing modality recovery network, Pattern Recognition 141 (2023) 109665

T. Zhou, Feature fusion and latent feature learning guided brain tumor segmen- tation and missing modality recovery network, Pattern Recognition 141 (2023) 109665. doi:https://doi.org/10.1016/j.patcog.2023.109665

work page doi:10.1016/j.patcog.2023.109665 2023
[2]

Q. Zhu, S. Zhu, B. Du, Y . Wang, Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen ct imaging, Pattern Recogni- tion 166 (2025) 111695. doi:https://doi.org/10.1016/j.patcog.2025.111695

work page doi:10.1016/j.patcog.2025.111695 2025
[3]

Y . Luo, D. Nie, B. Zhan, Z. Li, X. Wu, J. Zhou, Y . Wang, D. Shen, Edge-preserving mri image synthesis via adversarial network with iterative multi-scale fusion, Neurocomputing 452 (2021) 63–77. doi:https://doi.org/10.1016/j.neucom.2021.04.060

work page doi:10.1016/j.neucom.2021.04.060 2021
[4]

B. Cao, H. Cao, J. Liu, P. Zhu, C. Zhang, Q. Hu, Autoencoder-based collaborative attention gan for multi-modal image synthesis, IEEE Transactions on Multimedia 26 (2024) 995–1010. doi:10.1109/TMM.2023.3274990

work page doi:10.1109/tmm.2023.3274990 2024
[5]

Jiang, Y

L. Jiang, Y . Mao, X. Wang, X. Chen, C. Li, Cola-di ff: Conditional latent dif- fusion model for multi-modal mri synthesis, in: H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, R. Taylor (Eds.), Med- ical Image Computing and Computer Assisted Intervention – MICCAI 2023, Springer Nature Switzerland, Cham, 2023, pp. 398–408

work page 2023
[6]

Dalmaz, M

O. Dalmaz, M. Yurt, T. C ¸ ukur, Resvit: Residual vision transformers for multi- modal medical image synthesis, IEEE Transactions on Medical Imaging 41 (10) (2022) 2598–2614. doi:10.1109/TMI.2022.3167808

work page doi:10.1109/tmi.2022.3167808 2022
[7]

Zhang, X

X. Zhang, X. He, J. Guo, N. Ettehadi, N. Aw, D. Semanek, J. Posner, A. Laine, Y . Wang, Ptnet3d: A 3d high-resolution longitudinal infant brain mri synthesizer based on transformers, IEEE Transactions on Medical Imaging 41 (10) (2022) 2925–2940. doi:10.1109/TMI.2022.3174827. 22

work page doi:10.1109/tmi.2022.3174827 2022
[8]

L. Kong, C. Lian, D. Huang, Z. Li, Y . Hu, Q. Zhou, Breaking the dilemma of med- ical image-to-image translation, in: Proceedings of the 35th International Confer- ence on Neural Information Processing Systems, NIPS ’21, Curran Associates Inc., Red Hook, NY , USA, 2021

work page 2021
[9]

S. U. Dar, M. Yurt, L. Karacan, A. Erdem, E. Erdem, T. C ¸ ukur, Image synthesis in multi-contrast mri with conditional generative adversarial net- works, IEEE Transactions on Medical Imaging 38 (10) (2019) 2375–2388. doi:10.1109/TMI.2019.2901750

work page doi:10.1109/tmi.2019.2901750 2019
[10]

J.-Y . Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image trans- lation using cycle-consistent adversarial networks, in: 2017 IEEE Inter- national Conference on Computer Vision (ICCV), 2017, pp. 2242–2251. doi:10.1109/ICCV .2017.244

work page doi:10.1109/iccv 2017
[11]

Y . Li, T. Zhou, K. He, Y . Zhou, D. Shen, Multi-scale transformer net- work with edge-aware pre-training for cross-modality mr image synthe- sis, IEEE Transactions on Medical Imaging 42 (11) (2023) 3395–3407. doi:10.1109/TMI.2023.3288001

work page doi:10.1109/tmi.2023.3288001 2023
[12]

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, R. Girshick, Masked autoen- coders are scalable vision learners, in: 2022 IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15979–15988. doi:10.1109/CVPR52688.2022.01553

work page doi:10.1109/cvpr52688.2022.01553 2022
[13]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations (ICLR), 2021

work page 2021
[14]

Kunanbayev, V

K. Kunanbayev, V . Shen, D.-S. Kim, Training vit with limited data for alzheimer’s disease classification: An empirical study, in: M. G. Linguraru, Q. Dou, A. Fera- gen, S. Giannarou, B. Glocker, K. Lekadir, J. A. Schnabel (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, Springer Na- ture Switzerland, Cham, 2024, pp. 334–343. 23

work page 2024
[15]

Y . Choi, S. Lee, Ct synthesis using cyclegan with swin transformer for magnetic resonance imaging guided radiotherapy, in: Medical Imaging 2024: Physics of Medical Imaging, V ol. 12925, SPIE, 2024, pp. 825–829

work page 2024
[16]

M. Yurt, S. U. Dar, A. Erdem, E. Erdem, K. K. Oguz, T. C ¸ ukur, mustgan: multi- stream generative adversarial networks for mr image synthesis, Medical Image Analysis 70 (2021) 101944. doi:https://doi.org/10.1016/j.media.2020.101944

work page doi:10.1016/j.media.2020.101944 2021
[17]

V . M. H. Phan, Z. Liao, J. W. Verjans, M.-S. To, Structure-preserving synthe- sis: Maskgan for unpaired mr-ct translation, in: H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, R. Taylor (Eds.), Med- ical Image Computing and Computer Assisted Intervention – MICCAI 2023, Springer Nature Switzerland, Cham, 2023, pp. 56–65

work page 2023
[18]

C. Gong, Y . Huang, M. Luo, S. Cao, X. Gong, S. Ding, X. Yuan, W. Zheng, Y . Zhang, Channel-wise attention enhanced and structural similarity constrained cyclegan for e ffective synthetic ct generation from head and neck mri images, Radiation Oncology 19 (1) (2024) 37. doi:10.1186/s13014-024-02429-2

work page doi:10.1186/s13014-024-02429-2 2024
[19]

Dhariwal, A

P. Dhariwal, A. Nichol, Di ffusion models beat gans on image synthesis, in: Pro- ceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Curran Associates Inc., Red Hook, NY , USA, 2021

work page 2021
[20]

¨Ozbey, O

M. ¨Ozbey, O. Dalmaz, S. U. H. Dar, H. A. Bedel, c. ¨Ozturk, A. G ¨ung¨or, T. C ¸ ukur, Unsupervised medical image translation with adversarial di ffusion models, IEEE Transactions on Medical Imaging 42 (12) (2023) 3524–3539. doi:10.1109/TMI.2023.3290149

work page doi:10.1109/tmi.2023.3290149 2023
[21]

J. Ho, A. Jain, P. Abbeel, Denoising di ffusion probabilistic models, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, V ol. 33, Curran Associates, Inc., 2020, pp. 6840–6851

work page 2020
[22]

B. Liu, Y . Zhu, K. Song, A. Elgammal, Towards faster and stabilized gan train- 24 ing for high-fidelity few-shot image synthesis, in: International Conference on Learning Representations (ICLR), 2021

work page 2021
[23]

Gulrajani, F

I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, A. Courville, Improved train- ing of wasserstein gans, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY , USA, 2017, p. 5769–5779

work page 2017
[24]

S. Mo, M. Cho, J. Shin, Freeze the discriminator: a simple baseline for fine-tuning gans, in: CVPR AI for Content Creation Workshop, 2020

work page 2020
[25]

S. Zhao, Z. Liu, J. Lin, J.-Y . Zhu, S. Han, Di fferentiable augmentation for data- efficient gan training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, V ol. 33, Cur- ran Associates, Inc., 2020, pp. 7559–7570

work page 2020
[26]

T. Chen, X. Zhai, M. Ritter, M. Lucic, N. Houlsby, Self-supervised gans via auxiliary rotation loss, in: 2019 IEEE /CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2019, pp. 12146–12155. doi:10.1109/CVPR.2019.01243

work page doi:10.1109/cvpr.2019.01243 2019
[27]

Tran, V .-H

N.-T. Tran, V .-H. Tran, B.-N. Nguyen, L. Yang, N.-M. M. Cheung, Self- supervised gan: Analysis and improvement with multi-class minimax game, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems, V ol. 32, Curran As- sociates, Inc., 2019

work page 2019
[28]

B. Li, K. Xue, B. Liu, Y .-K. Lai, Bbdm: Image-to-image translation with brownian bridge di ffusion models, in: 2023 IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1952–1961. doi:10.1109/CVPR52729.2023.00194

work page doi:10.1109/cvpr52729.2023.00194 2023
[29]

X. Su, J. Song, C. Meng, S. Ermon, Dual di ffusion implicit bridges for image- to-image translation, in: International Conference on Learning Representations, 2023. 25

work page 2023
[30]

G.-H. Liu, A. Vahdat, D.-A. Huang, E. Theodorou, W. Nie, A. Anandkumar, I2SB: Image-to-image schr ¨odinger bridge, in: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning, V ol. 202 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 22042–22062

work page 2023
[31]

B. Kim, G. Kwon, K. Kim, J. C. Ye, Unpaired image-to-image translation via neural schr¨odinger bridge, in: International Conference on Learning Representa- tions (ICLR), 2024. doi:10.48550/arXiv.2305.15086

work page doi:10.48550/arxiv.2305.15086 2024
[32]

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y . Zhang, J. Rector-Brooks, G. Wolf, Y . Bengio, Improving and generalizing flow-based generative models with mini- batch optimal transport, Transactions on Machine Learning Research (2024). doi:10.48550/arXiv.2302.00482

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.00482 2024
[33]

T. Park, A. A. Efros, R. Zhang, J.-Y . Zhu, Contrastive learning for unpaired image-to-image translation, in: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX, Springer- Verlag, Berlin, Heidelberg, 2020, p. 319–345. doi:10.1007 /978-3-030-58545- 7 19

work page 2020
[34]

, author Zhu, J.Y

P. Isola, J.-Y . Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–5976. doi:10.1109/CVPR.2017.632

work page doi:10.1109/cvpr.2017.632 2017
[35]

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y . Bengio, A. Courville, D. Hjelm, Mutual information neural estimation, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, V ol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 531–540

work page 2018
[36]

S. Mo, M. Cho, J. Shin, Instagan: Instance-aware image-to-image translation, in: International Conference on Learning Representations (ICLR), 2019. URL https://openreview.net/forum?id=ryxwJhC9YX 26

work page 2019
[37]

Bakas, M

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shino- hara, C. Berger, S. M. Ha, M. Rozycki, et al., Identifying the best machine learn- ing algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge (2018). doi:10.17863/CAM.38755

work page doi:10.17863/cam.38755 2018
[38]

H. J. Kuijf, J. M. Biesbroek, J. De Bresser, R. Heinen, S. Andermatt, M. Bento, M. Berseth, M. Belyaev, M. J. Cardoso, A. Casamitjana, D. L. Collins, M. Dadar, A. Georgiou, M. Ghafoorian, D. Jin, A. Khademi, J. Knight, H. Li, X. Llad ´o, M. Luna, Q. Mahmood, R. McKinley, A. Mehrtash, S. Ourselin, B.-Y . Park, H. Park, S. H. Park, S. Pezold, E. Puybareau, ...

work page doi:10.1109/tmi.2019.2905770 2019
[39]

C. H. Sudre, K. Van Wijnen, F. Dubost, H. Adams, D. Atkinson, F. Barkhof, M. A. Birhanu, E. E. Bron, R. Camarasa, N. Chaturvedi, Y . Chen, Z. Chen, S. Chen, Q. Dou, T. Evans, I. Ezhov, H. Gao, M. Girones Sanguesa, J. D. Gispert, B. Gomez Anson, A. D. Hughes, M. A. Ikram, S. Ingala, H. R. Jaeger, F. Kofler, H. J. Kuijf, D. Kutnar, M. Lee, B. Li, L. Lorenzi...

work page doi:10.1016/j.media.2023.103029 2021
[40]

G. Park, J. Hong, B. A. Du ffy, J.-M. Lee, H. Kim, White matter hyperintensities segmentation using the ensemble u-net with multi- scale highlighting foregrounds, NeuroImage 237 (2021) 118140. doi:https://doi.org/10.1016/j.neuroimage.2021.118140

work page doi:10.1016/j.neuroimage.2021.118140 2021
[41]

Bhalerao, G

G. Bhalerao, G. Gillis, M. Dembele, S. Suri, K. Ebmeier, J. Klein, M. Hu, 27 C. Mackay, L. Griffanti, Automated quality control of t1-weighted brain mri scans for clinical research: methods comparison and design of a quality prediction clas- sifier, medRxiv (2024). doi:10.1101/2024.04.12.24305603

work page doi:10.1101/2024.04.12.24305603 2024
[42]

FAIR, Facebook AI Research, fvcore: Core library for computer vision research projects at fair, https://github.com/facebookresearch/fvcore (2019). 28

work page 2019

[1] [1]

Zhou, Feature fusion and latent feature learning guided brain tumor segmen- tation and missing modality recovery network, Pattern Recognition 141 (2023) 109665

T. Zhou, Feature fusion and latent feature learning guided brain tumor segmen- tation and missing modality recovery network, Pattern Recognition 141 (2023) 109665. doi:https://doi.org/10.1016/j.patcog.2023.109665

work page doi:10.1016/j.patcog.2023.109665 2023

[2] [2]

Q. Zhu, S. Zhu, B. Du, Y . Wang, Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen ct imaging, Pattern Recogni- tion 166 (2025) 111695. doi:https://doi.org/10.1016/j.patcog.2025.111695

work page doi:10.1016/j.patcog.2025.111695 2025

[3] [3]

Y . Luo, D. Nie, B. Zhan, Z. Li, X. Wu, J. Zhou, Y . Wang, D. Shen, Edge-preserving mri image synthesis via adversarial network with iterative multi-scale fusion, Neurocomputing 452 (2021) 63–77. doi:https://doi.org/10.1016/j.neucom.2021.04.060

work page doi:10.1016/j.neucom.2021.04.060 2021

[4] [4]

B. Cao, H. Cao, J. Liu, P. Zhu, C. Zhang, Q. Hu, Autoencoder-based collaborative attention gan for multi-modal image synthesis, IEEE Transactions on Multimedia 26 (2024) 995–1010. doi:10.1109/TMM.2023.3274990

work page doi:10.1109/tmm.2023.3274990 2024

[5] [5]

Jiang, Y

L. Jiang, Y . Mao, X. Wang, X. Chen, C. Li, Cola-di ff: Conditional latent dif- fusion model for multi-modal mri synthesis, in: H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, R. Taylor (Eds.), Med- ical Image Computing and Computer Assisted Intervention – MICCAI 2023, Springer Nature Switzerland, Cham, 2023, pp. 398–408

work page 2023

[6] [6]

Dalmaz, M

O. Dalmaz, M. Yurt, T. C ¸ ukur, Resvit: Residual vision transformers for multi- modal medical image synthesis, IEEE Transactions on Medical Imaging 41 (10) (2022) 2598–2614. doi:10.1109/TMI.2022.3167808

work page doi:10.1109/tmi.2022.3167808 2022

[7] [7]

Zhang, X

X. Zhang, X. He, J. Guo, N. Ettehadi, N. Aw, D. Semanek, J. Posner, A. Laine, Y . Wang, Ptnet3d: A 3d high-resolution longitudinal infant brain mri synthesizer based on transformers, IEEE Transactions on Medical Imaging 41 (10) (2022) 2925–2940. doi:10.1109/TMI.2022.3174827. 22

work page doi:10.1109/tmi.2022.3174827 2022

[8] [8]

L. Kong, C. Lian, D. Huang, Z. Li, Y . Hu, Q. Zhou, Breaking the dilemma of med- ical image-to-image translation, in: Proceedings of the 35th International Confer- ence on Neural Information Processing Systems, NIPS ’21, Curran Associates Inc., Red Hook, NY , USA, 2021

work page 2021

[9] [9]

S. U. Dar, M. Yurt, L. Karacan, A. Erdem, E. Erdem, T. C ¸ ukur, Image synthesis in multi-contrast mri with conditional generative adversarial net- works, IEEE Transactions on Medical Imaging 38 (10) (2019) 2375–2388. doi:10.1109/TMI.2019.2901750

work page doi:10.1109/tmi.2019.2901750 2019

[10] [10]

J.-Y . Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image trans- lation using cycle-consistent adversarial networks, in: 2017 IEEE Inter- national Conference on Computer Vision (ICCV), 2017, pp. 2242–2251. doi:10.1109/ICCV .2017.244

work page doi:10.1109/iccv 2017

[11] [11]

Y . Li, T. Zhou, K. He, Y . Zhou, D. Shen, Multi-scale transformer net- work with edge-aware pre-training for cross-modality mr image synthe- sis, IEEE Transactions on Medical Imaging 42 (11) (2023) 3395–3407. doi:10.1109/TMI.2023.3288001

work page doi:10.1109/tmi.2023.3288001 2023

[12] [12]

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, R. Girshick, Masked autoen- coders are scalable vision learners, in: 2022 IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15979–15988. doi:10.1109/CVPR52688.2022.01553

work page doi:10.1109/cvpr52688.2022.01553 2022

[13] [13]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations (ICLR), 2021

work page 2021

[14] [14]

Kunanbayev, V

K. Kunanbayev, V . Shen, D.-S. Kim, Training vit with limited data for alzheimer’s disease classification: An empirical study, in: M. G. Linguraru, Q. Dou, A. Fera- gen, S. Giannarou, B. Glocker, K. Lekadir, J. A. Schnabel (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, Springer Na- ture Switzerland, Cham, 2024, pp. 334–343. 23

work page 2024

[15] [15]

Y . Choi, S. Lee, Ct synthesis using cyclegan with swin transformer for magnetic resonance imaging guided radiotherapy, in: Medical Imaging 2024: Physics of Medical Imaging, V ol. 12925, SPIE, 2024, pp. 825–829

work page 2024

[16] [16]

M. Yurt, S. U. Dar, A. Erdem, E. Erdem, K. K. Oguz, T. C ¸ ukur, mustgan: multi- stream generative adversarial networks for mr image synthesis, Medical Image Analysis 70 (2021) 101944. doi:https://doi.org/10.1016/j.media.2020.101944

work page doi:10.1016/j.media.2020.101944 2021

[17] [17]

V . M. H. Phan, Z. Liao, J. W. Verjans, M.-S. To, Structure-preserving synthe- sis: Maskgan for unpaired mr-ct translation, in: H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, R. Taylor (Eds.), Med- ical Image Computing and Computer Assisted Intervention – MICCAI 2023, Springer Nature Switzerland, Cham, 2023, pp. 56–65

work page 2023

[18] [18]

C. Gong, Y . Huang, M. Luo, S. Cao, X. Gong, S. Ding, X. Yuan, W. Zheng, Y . Zhang, Channel-wise attention enhanced and structural similarity constrained cyclegan for e ffective synthetic ct generation from head and neck mri images, Radiation Oncology 19 (1) (2024) 37. doi:10.1186/s13014-024-02429-2

work page doi:10.1186/s13014-024-02429-2 2024

[19] [19]

Dhariwal, A

P. Dhariwal, A. Nichol, Di ffusion models beat gans on image synthesis, in: Pro- ceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Curran Associates Inc., Red Hook, NY , USA, 2021

work page 2021

[20] [20]

¨Ozbey, O

M. ¨Ozbey, O. Dalmaz, S. U. H. Dar, H. A. Bedel, c. ¨Ozturk, A. G ¨ung¨or, T. C ¸ ukur, Unsupervised medical image translation with adversarial di ffusion models, IEEE Transactions on Medical Imaging 42 (12) (2023) 3524–3539. doi:10.1109/TMI.2023.3290149

work page doi:10.1109/tmi.2023.3290149 2023

[21] [21]

J. Ho, A. Jain, P. Abbeel, Denoising di ffusion probabilistic models, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, V ol. 33, Curran Associates, Inc., 2020, pp. 6840–6851

work page 2020

[22] [22]

B. Liu, Y . Zhu, K. Song, A. Elgammal, Towards faster and stabilized gan train- 24 ing for high-fidelity few-shot image synthesis, in: International Conference on Learning Representations (ICLR), 2021

work page 2021

[23] [23]

Gulrajani, F

I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, A. Courville, Improved train- ing of wasserstein gans, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY , USA, 2017, p. 5769–5779

work page 2017

[24] [24]

S. Mo, M. Cho, J. Shin, Freeze the discriminator: a simple baseline for fine-tuning gans, in: CVPR AI for Content Creation Workshop, 2020

work page 2020

[25] [25]

S. Zhao, Z. Liu, J. Lin, J.-Y . Zhu, S. Han, Di fferentiable augmentation for data- efficient gan training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, V ol. 33, Cur- ran Associates, Inc., 2020, pp. 7559–7570

work page 2020

[26] [26]

T. Chen, X. Zhai, M. Ritter, M. Lucic, N. Houlsby, Self-supervised gans via auxiliary rotation loss, in: 2019 IEEE /CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2019, pp. 12146–12155. doi:10.1109/CVPR.2019.01243

work page doi:10.1109/cvpr.2019.01243 2019

[27] [27]

Tran, V .-H

N.-T. Tran, V .-H. Tran, B.-N. Nguyen, L. Yang, N.-M. M. Cheung, Self- supervised gan: Analysis and improvement with multi-class minimax game, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems, V ol. 32, Curran As- sociates, Inc., 2019

work page 2019

[28] [28]

B. Li, K. Xue, B. Liu, Y .-K. Lai, Bbdm: Image-to-image translation with brownian bridge di ffusion models, in: 2023 IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1952–1961. doi:10.1109/CVPR52729.2023.00194

work page doi:10.1109/cvpr52729.2023.00194 2023

[29] [29]

X. Su, J. Song, C. Meng, S. Ermon, Dual di ffusion implicit bridges for image- to-image translation, in: International Conference on Learning Representations, 2023. 25

work page 2023

[30] [30]

G.-H. Liu, A. Vahdat, D.-A. Huang, E. Theodorou, W. Nie, A. Anandkumar, I2SB: Image-to-image schr ¨odinger bridge, in: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning, V ol. 202 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 22042–22062

work page 2023

[31] [31]

B. Kim, G. Kwon, K. Kim, J. C. Ye, Unpaired image-to-image translation via neural schr¨odinger bridge, in: International Conference on Learning Representa- tions (ICLR), 2024. doi:10.48550/arXiv.2305.15086

work page doi:10.48550/arxiv.2305.15086 2024

[32] [32]

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y . Zhang, J. Rector-Brooks, G. Wolf, Y . Bengio, Improving and generalizing flow-based generative models with mini- batch optimal transport, Transactions on Machine Learning Research (2024). doi:10.48550/arXiv.2302.00482

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.00482 2024

[33] [33]

T. Park, A. A. Efros, R. Zhang, J.-Y . Zhu, Contrastive learning for unpaired image-to-image translation, in: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX, Springer- Verlag, Berlin, Heidelberg, 2020, p. 319–345. doi:10.1007 /978-3-030-58545- 7 19

work page 2020

[34] [34]

, author Zhu, J.Y

P. Isola, J.-Y . Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–5976. doi:10.1109/CVPR.2017.632

work page doi:10.1109/cvpr.2017.632 2017

[35] [35]

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y . Bengio, A. Courville, D. Hjelm, Mutual information neural estimation, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, V ol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 531–540

work page 2018

[36] [36]

S. Mo, M. Cho, J. Shin, Instagan: Instance-aware image-to-image translation, in: International Conference on Learning Representations (ICLR), 2019. URL https://openreview.net/forum?id=ryxwJhC9YX 26

work page 2019

[37] [37]

Bakas, M

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shino- hara, C. Berger, S. M. Ha, M. Rozycki, et al., Identifying the best machine learn- ing algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge (2018). doi:10.17863/CAM.38755

work page doi:10.17863/cam.38755 2018

[38] [38]

H. J. Kuijf, J. M. Biesbroek, J. De Bresser, R. Heinen, S. Andermatt, M. Bento, M. Berseth, M. Belyaev, M. J. Cardoso, A. Casamitjana, D. L. Collins, M. Dadar, A. Georgiou, M. Ghafoorian, D. Jin, A. Khademi, J. Knight, H. Li, X. Llad ´o, M. Luna, Q. Mahmood, R. McKinley, A. Mehrtash, S. Ourselin, B.-Y . Park, H. Park, S. H. Park, S. Pezold, E. Puybareau, ...

work page doi:10.1109/tmi.2019.2905770 2019

[39] [39]

C. H. Sudre, K. Van Wijnen, F. Dubost, H. Adams, D. Atkinson, F. Barkhof, M. A. Birhanu, E. E. Bron, R. Camarasa, N. Chaturvedi, Y . Chen, Z. Chen, S. Chen, Q. Dou, T. Evans, I. Ezhov, H. Gao, M. Girones Sanguesa, J. D. Gispert, B. Gomez Anson, A. D. Hughes, M. A. Ikram, S. Ingala, H. R. Jaeger, F. Kofler, H. J. Kuijf, D. Kutnar, M. Lee, B. Li, L. Lorenzi...

work page doi:10.1016/j.media.2023.103029 2021

[40] [40]

G. Park, J. Hong, B. A. Du ffy, J.-M. Lee, H. Kim, White matter hyperintensities segmentation using the ensemble u-net with multi- scale highlighting foregrounds, NeuroImage 237 (2021) 118140. doi:https://doi.org/10.1016/j.neuroimage.2021.118140

work page doi:10.1016/j.neuroimage.2021.118140 2021

[41] [41]

Bhalerao, G

G. Bhalerao, G. Gillis, M. Dembele, S. Suri, K. Ebmeier, J. Klein, M. Hu, 27 C. Mackay, L. Griffanti, Automated quality control of t1-weighted brain mri scans for clinical research: methods comparison and design of a quality prediction clas- sifier, medRxiv (2024). doi:10.1101/2024.04.12.24305603

work page doi:10.1101/2024.04.12.24305603 2024

[42] [42]

FAIR, Facebook AI Research, fvcore: Core library for computer vision research projects at fair, https://github.com/facebookresearch/fvcore (2019). 28

work page 2019