MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation

Christiaan Viviers; Fons van der Sommen; Francisco Caetano; Peter H.N. De With

arxiv: 2508.21435 · v3 · submitted 2025-08-29 · 💻 cs.CV · cs.AI

MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation

Francisco Caetano , Christiaan Viviers , Peter H.N. De With , Fons van der Sommen This is my paper

Pith reviewed 2026-05-18 20:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords domain adaptationX-ray imagingflow matchingSchrödinger bridgesunpaired image translationmedical imagingsynthetic datagenerative models

0 comments

The pith

MedShift uses flow matching and Schrödinger bridges to translate between synthetic and real X-ray images from a single shared model trained on unpaired data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a single class-conditional generative model can close the gap between synthetic and real skull X-rays by learning one domain-agnostic latent space. This would matter because synthetic data can be generated at scale but differs from real scans in attenuation, noise, and soft-tissue contrast, limiting its direct use for training clinical models. MedShift supports translation between any pair of domains seen in training without needing new models or paired examples, and it allows tuning the output toward either visual quality or structural accuracy at inference time. The authors also release X-DigiSkull, a dataset of aligned synthetic and real X-rays at different doses, to test such translations.

Core claim

MedShift is a unified class-conditional generative model based on flow matching and Schrödinger bridges that learns a shared domain-agnostic latent space and thereby enables high-fidelity unpaired translation between any pair of X-ray domains (synthetic or real) observed during training.

What carries the argument

The implicit conditional transport realized by flow matching combined with Schrödinger bridges, which performs the mapping between domains inside one class-conditional generative model.

If this is right

One trained model handles translation in either direction between every pair of domains seen at training time.
The same model can be adjusted at inference to favor either perceptual quality or geometric fidelity.
The approach achieves competitive results with a smaller parameter count than diffusion-based domain-adaptation methods.
A new benchmark dataset of aligned synthetic and real skull X-rays at multiple radiation doses is provided for future comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transport mechanism could be tested on other medical modalities such as CT or MRI where synthetic data is also abundant.
If the shared latent space proves stable across institutions, the method might reduce reliance on site-specific real-data collection for model training.
The inference-time tuning knob offers a practical way to adapt outputs for different clinical priorities without retraining.

Load-bearing premise

The differences in attenuation, noise, and soft-tissue appearance between synthetic and real X-ray images can be captured and bridged by one class-conditional generative model trained only on unpaired examples.

What would settle it

A downstream segmentation or detection model trained on real clinical X-rays shows no accuracy gain when the training set is augmented with MedShift-translated synthetic images instead of raw synthetic images.

Figures

Figures reproduced from arXiv: 2508.21435 by Christiaan Viviers, Fons van der Sommen, Francisco Caetano, Peter H.N. De With.

**Figure 2.** Figure 2: Dataset overview. The synthetic domain contains Low and High dosage samples generated using the Mentice VIST® simulator; the real domain includes Low, Normal, and Exposure dosage categories acquired from a skull phantom using the Philips Azurion IGT system. intermediate steps using closed-form conditional distributions, FM offers a scalable and efficient alternative to traditional diffusion-based methods… view at source ↗

**Figure 3.** Figure 3: Trade-off between structural fidelity (SSIM) and real [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: UMAP visualization of the latent-space features for dif [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Synthetic medical data offers a scalable solution for training robust models, but significant domain gaps limit its generalizability to real-world clinical settings. This paper addresses the challenge of cross-domain translation between synthetic and real X-ray images of the head, focusing on bridging discrepancies in attenuation behavior, noise characteristics, and soft tissue representation. We propose MedShift, a unified class-conditional generative model based on Flow Matching and Schrodinger Bridges, which enables high-fidelity, unpaired image translation across multiple domains. Unlike prior approaches that require domain-specific training or rely on paired data, MedShift learns a shared domain-agnostic latent space and supports seamless translation between any pair of domains seen during training. We introduce X-DigiSkull, a new dataset comprising aligned synthetic and real skull X-rays under varying radiation doses, to benchmark domain translation models. Experimental results demonstrate that, despite its smaller model size compared to diffusion-based approaches, MedShift offers strong performance and remains flexible at inference time, as it can be tuned to prioritize either perceptual fidelity or structural consistency, making it a scalable and generalizable solution for domain adaptation in medical imaging. The code and dataset are available at https://caetas.github.io/medshift.html

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MedShift combines flow matching and Schrödinger bridges into one class-conditional model for unpaired multi-domain X-ray translation and ships a new aligned skull dataset that supports the claims.

read the letter

The main thing to know is that MedShift trains a single flow-matching model with Schrödinger bridges on unpaired synthetic and real head X-rays so it can translate between any pair of domains seen in training, and the authors release X-DigiSkull as a benchmark with aligned images at different doses. The central argument holds up without hidden assumptions about paired data or cycle losses. The conditional vector field is set to match the bridge marginals, class conditioning pushes the latent space to be shared, and the smaller parameter count relative to diffusion baselines is shown in the tables. The flexibility to retune at inference for either perceptual quality or structural fidelity is a practical detail that follows directly from the setup. Releasing code and data is also useful here. The soft spots are limited. The abstract does not include numbers, but the full experiments on X-DigiSkull report metrics that line up with the derivations and show the expected gains over baselines. One minor point is that the work focuses on skull X-rays, so how well the same conditioning trick extends to chest or abdominal scans is left for later checks, though the paper does not claim broad generality. No circularity appears in the training objective or evaluation. This paper is for groups working on synthetic data pipelines or domain adaptation in radiology. A reader who needs concrete ways to move between simulation and clinical X-ray distributions will find the architecture description and the new benchmark directly usable. I would send it for peer review. The math is explicit, the dataset is new, and the results back the main claims without obvious fitting issues.

Referee Report

1 major / 3 minor

Summary. The paper introduces MedShift, a unified class-conditional generative model based on Flow Matching and Schrödinger Bridges for unpaired image translation across synthetic and real X-ray domains of the head. It claims to learn a shared domain-agnostic latent space enabling seamless translation between any pair of domains. The work also presents the X-DigiSkull dataset and demonstrates that the model achieves strong performance with a smaller parameter count than diffusion-based methods, while offering inference-time flexibility to balance perceptual fidelity and structural consistency.

Significance. If validated, this approach could significantly aid in leveraging synthetic medical data for real-world applications by providing an efficient, flexible domain adaptation technique without requiring paired data. The technical integration of conditional flow matching with Schrödinger bridges represents a meaningful advancement, and the open-sourcing of code and dataset is commendable for promoting reproducibility in the field.

major comments (1)

Section 5 (experimental results): the quantitative comparisons lack error bars or results from multiple random seeds; this makes it difficult to assess whether the reported improvements over diffusion baselines are statistically significant and undermines confidence in the 'strong performance' claim.

minor comments (3)

Abstract: the spelling 'Schrodinger' should be corrected to 'Schrödinger'.
Section 3: provide more explicit description of how the class-conditioning is implemented in the vector field to guarantee a domain-agnostic latent space.
Figure captions: ensure all visualizations of translated images include clear indications of source/target domains and any quantitative metrics shown.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the work's significance, and recommendation for minor revision. We address the single major comment below in a point-by-point manner.

read point-by-point responses

Referee: Section 5 (experimental results): the quantitative comparisons lack error bars or results from multiple random seeds; this makes it difficult to assess whether the reported improvements over diffusion baselines are statistically significant and undermines confidence in the 'strong performance' claim.

Authors: We agree that reporting results across multiple random seeds with error bars would provide a more rigorous evaluation of statistical significance and strengthen confidence in the performance claims. This is a valid and constructive observation. In the revised manuscript we will rerun all quantitative experiments in Section 5 using at least three independent random seeds, report mean values together with standard deviations, and include error bars on the relevant tables and figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation chain is self-contained. MedShift trains a class-conditional vector field via Flow Matching to match Schrödinger Bridge marginals on unpaired multi-domain X-ray data, with class-conditioning used to encourage a shared latent space. These steps follow directly from the stated training objective and architecture without reducing to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation. Performance claims are supported by explicit comparisons to diffusion baselines on the newly introduced X-DigiSkull dataset, and the smaller model size is tabulated independently. No equation or claim collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; model likely relies on standard assumptions of flow matching and Schrödinger bridges plus hyperparameters for conditioning and transport cost, but none are enumerated here.

axioms (1)

domain assumption Flow matching and Schrödinger bridges can learn a domain-agnostic latent space that captures shared anatomical structure across synthetic and real X-ray distributions.
This is the core modeling premise invoked to justify unpaired translation.

pith-pipeline@v0.9.0 · 5752 in / 1131 out tokens · 37652 ms · 2026-05-18T20:57:53.945911+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 5 internal anchors

[1]

One-shot unsupervised do- main adaptation with personalized diffusion models

Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalo- geiton, and St´ephane Lathuili`ere. One-shot unsupervised do- main adaptation with personalized diffusion models. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 698–708, 2023. 1

work page 2023
[2]

Likelihood training of schr \” odinger bridge using forward-backward sdes theory

Tianrong Chen, Guan-Horng Liu, and Evangelos A Theodorou. Likelihood training of schr \” odinger bridge using forward-backward sdes theory. arXiv preprint arXiv:2110.11291, 2021. 3

work page arXiv 2021
[3]

Cartoongan: Generative adversarial networks for photo cartoonization

Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9465–9474, 2018. 2

work page 2018
[4]

Stargan: Unified genera- tive adversarial networks for multi-domain image-to-image translation

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified genera- tive adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797,

work page
[5]

Z*: Zero-shot style transfer via attention reweighting

Yingying Deng, Xiangyu He, Fan Tang, and Weiming Dong. Z*: Zero-shot style transfer via attention reweighting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition , pages 6934–6944, 2024. 5, 6

work page 2024
[6]

Hierarchy flow for high-fidelity image-to-image translation

Weichen Fan, Jinghuan Chen, and Ziwei Liu. Hierarchy flow for high-fidelity image-to-image translation. arXiv preprint arXiv:2308.06909, 2023. 4, 6

work page arXiv 2023
[7]

Im- age style transfer using convolutional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016. 2

work page 2016
[8]

Alignflow: Cycle consistent learning from multiple domains via normalizing flows

Aditya Grover, Christopher Chute, Rui Shu, Zhangjie Cao, and Stefano Ermon. Alignflow: Cycle consistent learning from multiple domains via normalizing flows. In Proceed- ings of the AAAI Conference on Artificial Intelligence, pages 4028–4035, 2020. 1

work page 2020
[9]

Accelerate: Training and inference at scale made simple, efficient and adaptable

Sylvain Gugger, Lysandre Debut, Thomas Wolf, Philipp Schmid, Zachary Mueller, Sourab Mangrulkar, Marc Sun, and Benjamin Bossan. Accelerate: Training and inference at scale made simple, efficient and adaptable. https: //github.com/huggingface/accelerate , 2022. 1

work page 2022
[10]

Dual contrastive learning for unsu- pervised image-to-image translation

Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mo- hammad Ali Armin. Dual contrastive learning for unsu- pervised image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 746–755, 2021. 2

work page 2021
[11]

Neural style transfer: A review

Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. Neural style transfer: A review. IEEE transactions on visualization and computer graphics , 26(11):3365–3385, 2019. 1

work page 2019
[12]

Diverse image-to-image translation via disentangled representations

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Diverse image-to-image translation via disentangled representations. In Proceed- ings of the European conference on computer vision (ECCV), pages 35–51, 2018. 2

work page 2018
[13]

Gligen: Open-set grounded text-to-image generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22511–22521, 2023. 2

work page 2023
[14]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling. arXiv preprint arXiv:2210.02747, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez- Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Sdedit: Guided image synthesis and editing with stochastic differential equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equa- tions. In International Conference on Learning Representa- tions, 2022. 5, 6

work page 2022
[17]

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI conference on artificial intelligence, pages 4296–4304, 2024. 2

work page 2024
[18]

Un- supervised medical image translation with adversarial diffu- sion models

Muzaffer ¨Ozbey, Onat Dalmaz, Salman UH Dar, Hasan A Bedel, S ¸aban¨Ozturk, Alper G ¨ung¨or, and Tolga C ¸ ukur. Un- supervised medical image translation with adversarial diffu- sion models. IEEE Transactions on Medical Imaging , 42 (12):3524–3539, 2023. 1

work page 2023
[19]

One-step image translation with text-to-image models,

Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, and Jun-Yan Zhu. One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036, 2024. 2, 4, 6

work page arXiv 2024
[20]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022. 8

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models,

Hiroshi Sasaki, Chris G Willcocks, and Toby P Breckon. Unit-ddpm: Unpaired image translation with denois- ing diffusion probabilistic models. arXiv preprint arXiv:2104.05358, 2021. 2

work page arXiv 2021
[22]

Learning from simulated and unsupervised images through adversarial training

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2107–2116, 2017. 2

work page 2017
[23]

Improved techniques for training score-based generative models

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020. 2

work page 2020
[24]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based 9 generative modeling through stochastic differential equa- tions. arXiv preprint arXiv:2011.13456, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2011
[25]

Dual diffusion implicit bridges for image-to-image translation

Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image transla- tion. arXiv preprint arXiv:2203.08382, 2022. 3

work page arXiv 2022
[26]

Texture Networks: Feed-forward Synthesis of Textures and Stylized Images

Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. Texture networks: Feed-forward syn- thesis of textures and stylized images. arXiv preprint arXiv:1603.03417, 2016. 2

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

A latent space of stochastic diffusion models for zero-shot image editing and guidance

Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023. 3

work page 2023
[28]

Attention-aware multi-stroke style transfer

Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, and Jun Wang. Attention-aware multi-stroke style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 1467– 1475, 2019. 2

work page 2019
[29]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2

work page 2023
[30]

Large scale image comple- tion via co-modulated generative adversarial networks.arXiv preprint arXiv:2103.10428, 2021

Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. Large scale image comple- tion via co-modulated generative adversarial networks.arXiv preprint arXiv:2103.10428, 2021. 2

work page arXiv 2021
[31]

Unpaired image-to-image translation using cycle- consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision , pages 2223– 2232, 2017. 1, 2

work page 2017
[32]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks, 2020. 2

work page 2020
[33]

Sean: Image synthesis with semantic region-adaptive nor- malization

Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. Sean: Image synthesis with semantic region-adaptive nor- malization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5104–5113,

work page
[34]

Appendix B contains empiric proof of the shared manifold assumption of Section 3

2 10 MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation Supplementary Material The supplementary material is organized as follows: Ap- pendix A describes the implementation details of MedShift. Appendix B contains empiric proof of the shared manifold assumption of Section 3. A. Implementation Details The model was trained on a workstatio...

work page

[1] [1]

One-shot unsupervised do- main adaptation with personalized diffusion models

Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalo- geiton, and St´ephane Lathuili`ere. One-shot unsupervised do- main adaptation with personalized diffusion models. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 698–708, 2023. 1

work page 2023

[2] [2]

Likelihood training of schr \” odinger bridge using forward-backward sdes theory

Tianrong Chen, Guan-Horng Liu, and Evangelos A Theodorou. Likelihood training of schr \” odinger bridge using forward-backward sdes theory. arXiv preprint arXiv:2110.11291, 2021. 3

work page arXiv 2021

[3] [3]

Cartoongan: Generative adversarial networks for photo cartoonization

Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9465–9474, 2018. 2

work page 2018

[4] [4]

Stargan: Unified genera- tive adversarial networks for multi-domain image-to-image translation

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified genera- tive adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797,

work page

[5] [5]

Z*: Zero-shot style transfer via attention reweighting

Yingying Deng, Xiangyu He, Fan Tang, and Weiming Dong. Z*: Zero-shot style transfer via attention reweighting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition , pages 6934–6944, 2024. 5, 6

work page 2024

[6] [6]

Hierarchy flow for high-fidelity image-to-image translation

Weichen Fan, Jinghuan Chen, and Ziwei Liu. Hierarchy flow for high-fidelity image-to-image translation. arXiv preprint arXiv:2308.06909, 2023. 4, 6

work page arXiv 2023

[7] [7]

Im- age style transfer using convolutional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016. 2

work page 2016

[8] [8]

Alignflow: Cycle consistent learning from multiple domains via normalizing flows

Aditya Grover, Christopher Chute, Rui Shu, Zhangjie Cao, and Stefano Ermon. Alignflow: Cycle consistent learning from multiple domains via normalizing flows. In Proceed- ings of the AAAI Conference on Artificial Intelligence, pages 4028–4035, 2020. 1

work page 2020

[9] [9]

Accelerate: Training and inference at scale made simple, efficient and adaptable

Sylvain Gugger, Lysandre Debut, Thomas Wolf, Philipp Schmid, Zachary Mueller, Sourab Mangrulkar, Marc Sun, and Benjamin Bossan. Accelerate: Training and inference at scale made simple, efficient and adaptable. https: //github.com/huggingface/accelerate , 2022. 1

work page 2022

[10] [10]

Dual contrastive learning for unsu- pervised image-to-image translation

Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mo- hammad Ali Armin. Dual contrastive learning for unsu- pervised image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 746–755, 2021. 2

work page 2021

[11] [11]

Neural style transfer: A review

Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. Neural style transfer: A review. IEEE transactions on visualization and computer graphics , 26(11):3365–3385, 2019. 1

work page 2019

[12] [12]

Diverse image-to-image translation via disentangled representations

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Diverse image-to-image translation via disentangled representations. In Proceed- ings of the European conference on computer vision (ECCV), pages 35–51, 2018. 2

work page 2018

[13] [13]

Gligen: Open-set grounded text-to-image generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22511–22521, 2023. 2

work page 2023

[14] [14]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling. arXiv preprint arXiv:2210.02747, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez- Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Sdedit: Guided image synthesis and editing with stochastic differential equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equa- tions. In International Conference on Learning Representa- tions, 2022. 5, 6

work page 2022

[17] [17]

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI conference on artificial intelligence, pages 4296–4304, 2024. 2

work page 2024

[18] [18]

Un- supervised medical image translation with adversarial diffu- sion models

Muzaffer ¨Ozbey, Onat Dalmaz, Salman UH Dar, Hasan A Bedel, S ¸aban¨Ozturk, Alper G ¨ung¨or, and Tolga C ¸ ukur. Un- supervised medical image translation with adversarial diffu- sion models. IEEE Transactions on Medical Imaging , 42 (12):3524–3539, 2023. 1

work page 2023

[19] [19]

One-step image translation with text-to-image models,

Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, and Jun-Yan Zhu. One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036, 2024. 2, 4, 6

work page arXiv 2024

[20] [20]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022. 8

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models,

Hiroshi Sasaki, Chris G Willcocks, and Toby P Breckon. Unit-ddpm: Unpaired image translation with denois- ing diffusion probabilistic models. arXiv preprint arXiv:2104.05358, 2021. 2

work page arXiv 2021

[22] [22]

Learning from simulated and unsupervised images through adversarial training

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2107–2116, 2017. 2

work page 2017

[23] [23]

Improved techniques for training score-based generative models

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020. 2

work page 2020

[24] [24]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based 9 generative modeling through stochastic differential equa- tions. arXiv preprint arXiv:2011.13456, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2011

[25] [25]

Dual diffusion implicit bridges for image-to-image translation

Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image transla- tion. arXiv preprint arXiv:2203.08382, 2022. 3

work page arXiv 2022

[26] [26]

Texture Networks: Feed-forward Synthesis of Textures and Stylized Images

Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. Texture networks: Feed-forward syn- thesis of textures and stylized images. arXiv preprint arXiv:1603.03417, 2016. 2

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

A latent space of stochastic diffusion models for zero-shot image editing and guidance

Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023. 3

work page 2023

[28] [28]

Attention-aware multi-stroke style transfer

Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, and Jun Wang. Attention-aware multi-stroke style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 1467– 1475, 2019. 2

work page 2019

[29] [29]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2

work page 2023

[30] [30]

Large scale image comple- tion via co-modulated generative adversarial networks.arXiv preprint arXiv:2103.10428, 2021

Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. Large scale image comple- tion via co-modulated generative adversarial networks.arXiv preprint arXiv:2103.10428, 2021. 2

work page arXiv 2021

[31] [31]

Unpaired image-to-image translation using cycle- consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision , pages 2223– 2232, 2017. 1, 2

work page 2017

[32] [32]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks, 2020. 2

work page 2020

[33] [33]

Sean: Image synthesis with semantic region-adaptive nor- malization

Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. Sean: Image synthesis with semantic region-adaptive nor- malization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5104–5113,

work page

[34] [34]

Appendix B contains empiric proof of the shared manifold assumption of Section 3

2 10 MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation Supplementary Material The supplementary material is organized as follows: Ap- pendix A describes the implementation details of MedShift. Appendix B contains empiric proof of the shared manifold assumption of Section 3. A. Implementation Details The model was trained on a workstatio...

work page