Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Arnela Hadzic; Martin Urschler; Simon Johannes Joham

arxiv: 2510.04823 · v2 · submitted 2025-10-06 · 💻 cs.CV

Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Arnela Hadzic , Simon Johannes Joham , Martin Urschler This is my paper

Pith reviewed 2026-05-18 10:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords flow matchingsynthetic CTMRI to CTCBCT to CTimage synthesisradiotherapy3D generative modelconditional image generation

0 comments

The pith

Flow matching generates synthetic CT from MRI or CBCT by integrating a learned velocity field conditioned on encoder features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a fully 3D flow matching framework to synthesize CT images from MRI and CBCT scans for radiotherapy planning. A volume of Gaussian noise is transformed into a synthetic CT by integrating a velocity field that is conditioned on features extracted from the input scan via a lightweight 3D encoder. Separate models were trained and tested for MRI-to-sCT and CBCT-to-sCT tasks on the SynthRAD2025 benchmark across abdomen, head-and-neck, and thorax regions. The approach reconstructs overall anatomical layouts with good fidelity but shows limited preservation of small structures, mainly because training resolution was kept low to fit memory and runtime limits.

Core claim

We adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to a

What carries the argument

The learned flow matching velocity field that transforms Gaussian noise into synthetic CT, conditioned on features from a lightweight 3D encoder applied to the input MRI or CBCT volume.

If this is right

Enables MRI-only radiotherapy workflows by supplying synthetic CT without extra X-ray exposure.
Supports CBCT-based adaptive radiotherapy with maintained global anatomical accuracy.
Separate per-region models allow targeted performance for abdomen, head-and-neck, and thorax.
Low training resolution is identified as the main bottleneck for local structural fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Patch-based or latent-space flow matching could raise effective resolution without exceeding current memory limits.
The same conditioning pattern may transfer to other cross-modality medical synthesis problems.
Clinical integration would require testing whether global accuracy alone suffices for dose-calculation accuracy.
Higher-resolution variants would allow direct comparison of detail preservation against the current baseline.

Load-bearing premise

Features extracted by the lightweight 3D encoder from the input MRI or CBCT volume are sufficient to condition the flow matching velocity field for accurate synthesis across anatomical regions.

What would settle it

A controlled experiment that raises training resolution or switches to patch-based training while keeping the same encoder conditioning, then measures whether fine-detail metrics such as small-vessel or trabecular-bone fidelity improve or stay flat.

Figures

Figures reproduced from arXiv: 2510.04823 by Arnela Hadzic, Martin Urschler, Simon Johannes Joham.

**Figure 1.** Figure 1: Examples of synthetic CT images generated by our MRI → sCT (left column) and CBCT → sCT models (right column) for different anatomical regions. The results are reported using the metrics described in Section 3.2, with both the mean and standard deviation (mean ± std) for each metric. Qualitative results are shown in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

read the original abstract

Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Straightforward 3D flow matching application to MRI/CBCT-to-sCT on SynthRAD2025 with global structure claims but no numbers or ablations to back them up.

read the letter

The paper applies fully 3D flow matching to conditional synthesis of CT from MRI and from CBCT, training separate models for abdomen, head/neck, and thorax on the SynthRAD2025 benchmark. They condition the velocity field with features from a lightweight 3D encoder and note that global anatomy looks reasonable while fine detail suffers from the low resolution forced by memory limits. That is the main contribution: a direct domain extension of recent flow matching work to this multi-site radiotherapy task rather than a new theoretical framework. The setup follows standard supervised conditional generation and they flag the practical constraints honestly. The approach is reasonable for the goal of supporting MRI-only or CBCT-adaptive workflows. The soft spots are clear from the abstract. There are no quantitative metrics, no error bars, no baseline comparisons, and no ablation results on the encoder, the conditioning strength, or the resolution choice. The claim that global structures are accurately reconstructed therefore rests on qualitative description alone. The stress-test point about the lightweight encoder potentially dropping local cues is worth checking in the full text; if the paper shows no receptive-field analysis or feature visualization, that assumption stays untested and could explain some of the detail loss beyond just resolution. The citation pattern looks standard for the area. This is for people already working on synthetic CT generation or flow-based medical image models who want to see one more data point on the SynthRAD2025 benchmark. A reader looking for new methods or strong evidence would get limited value until the numbers appear. It deserves peer review because the task matters and the method is grounded enough to be worth referee time, provided the authors add the missing metrics and at least basic ablations.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a fully 3D Flow Matching (FM) framework for conditional synthesis of CT images from MRI or CBCT. A lightweight 3D encoder extracts features from the input to condition the FM velocity field, which transforms Gaussian noise into the synthetic CT. Separate models are trained for MRI-to-sCT and CBCT-to-sCT across abdomen, head/neck, and thorax regions, evaluated on the SynthRAD2025 Challenge benchmark. The authors report accurate reconstruction of global anatomical structures with limitations in fine details attributed to low training resolution due to computational constraints.

Significance. If supported by quantitative evidence, the approach could contribute to MRI-only and CBCT-based adaptive radiotherapy by offering an efficient generative model for image synthesis that reduces radiation exposure. The motivation from recent FM work for high-quality image generation is a reasonable starting point, and the use of a challenge benchmark provides a standardized evaluation setting.

major comments (2)

[Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.
[Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.

minor comments (1)

[Abstract] Abstract: The statement that 'validation and testing were performed through the challenge submission system' should be expanded with the specific challenge metrics and any available leaderboard context to allow readers to interpret the reported qualitative success.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires quantitative support for its claims and a more balanced discussion of potential contributing factors to the observed limitations. We respond to each major comment below and will incorporate revisions in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.

Authors: We agree that the abstract should provide quantitative backing for the central claim. The full manuscript reports MAE, SSIM, PSNR, and Dice scores (with standard deviations) on the SynthRAD2025 benchmark in Section 4 and the associated tables for each anatomical region and modality. We will revise the abstract to include representative metrics (e.g., mean MAE and SSIM across regions) drawn from the challenge submissions to directly support the statement about global anatomical reconstruction. revision: yes
Referee: [Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.

Authors: We acknowledge that the current text attributes the limitation primarily to resolution constraints arising from 3D memory limits. We did not conduct receptive-field analysis, encoder-depth ablations, or conditioning-strength comparisons in this work. We will revise the abstract and discussion to note that both low resolution and possible limitations in the lightweight encoder's feature extraction may contribute to fine-detail loss. This will be presented as an open question for future work rather than an exclusive attribution. revision: partial

Circularity Check

0 steps flagged

No significant circularity; standard supervised conditional generative modeling

full rationale

The paper presents a standard application of flow matching for conditional image synthesis, where a velocity field is learned via supervised training to map noise to sCT conditioned on features from a lightweight 3D encoder, with evaluation on the external SynthRAD2025 benchmark. No load-bearing step reduces by construction to its own inputs: the method description does not define the velocity field or conditioning features in terms of the target synthesis outputs, nor does it rename fitted parameters as independent predictions. Claims about global anatomical reconstruction and fine-detail limitations are tied directly to empirical results and training constraints rather than self-referential derivations or self-citation chains. The approach remains self-contained against external benchmarks without invoking uniqueness theorems or ansatzes from overlapping prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides minimal visibility into modeling assumptions; the central claim rests on standard neural network training and the unstated premise that flow matching velocity integration yields anatomically plausible CT volumes when conditioned appropriately.

axioms (1)

domain assumption Flow matching can learn a velocity field that transports Gaussian noise to the target data distribution when conditioned on input features.
Invoked by the choice of FM framework for conditional image synthesis.

pith-pipeline@v0.9.0 · 5727 in / 1295 out tokens · 30081 ms · 2026-05-18T10:10:43.596362+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt a fully 3D Flow Matching (FM) framework... across abdomen, head and neck, and thorax

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

International Journal of Computer Vision pp

Bogensperger, L., Narnhofer, D., Falk, A., Schindler, K., Pock, T.: FlowSDF: Flow matching for medical image segmentation using distance transforms. International Journal of Computer Vision pp. 1–13 (2025)

work page 2025
[2]

IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

Dorjsembe, Z., Pao, H.K., Odonchimed, S., Xiao, F.: Conditional diffusion models for semantic 3D brain MRI synthesis. IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

work page 2024
[3]

IEEE Journal of Biomedical and Health Informatics (2025)

Hadzic, A., Bogensperger, L., Berghold, A., Urschler, M.: Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization. IEEE Journal of Biomedical and Health Informatics (2025)

work page 2025
[4]

In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI)

Hadzic, A., Bogensperger, L., Joham, S.J., Urschler, M.: Synthetic Augmentation for Anatomical Landmark Localization Using DDPMs. In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI). pp. 1–12. Springer (2024)

work page 2024
[5]

Medical Image Analysis97, 103276 (2024)

Huijben, E.M., Terpstra, M.L., Pai, S., Thummerer, A., Koopmans, P., Afonso, M., Van Eijnatten, M., Gurney-Champion, O., Chen, Z., Zhang, Y., et al.: Gener- ating synthetic computed tomography for radiotherapy: SynthRAD2023 challenge report. Medical Image Analysis97, 103276 (2024)

work page 2024
[6]

Nature Methods18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods18(2), 203–211 (2021)

work page 2021
[7]

In: International Conference on Learning Representations (ICLR) (2023)

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: International Conference on Learning Representations (ICLR) (2023)

work page 2023
[8]

In: International Conference on Learning Representations (ICLR) (2023)

Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: International Conference on Learning Representations (ICLR) (2023)

work page 2023
[9]

Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis 7

Müller-Franzes, G., Niehues, J.M., Khader, F., Arasteh, S.T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nolte, T., Nebelung, S., et al.: A multimodal compar- ison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT ...

work page 2023
[10]

In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics

Neff, T., Payer, C., Štern, D., Urschler, M.: Generative adversarial network based synthesis for supervised medical image segmentation. In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics. pp. 140–145 (2017)

work page 2017
[11]

In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI)

Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D.: Medical image synthesis with context-aware generative adversarial networks. In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI). pp. 417–425. Springer (2017)

work page 2017
[12]

Medical Physics52(7), e17981 (2025)

Thummerer, A., van der Bijl, E., Galapon, A.J., Kamp, F., Savenije, M., Muijs, C., Aluwini, S., Steenbakkers, R.J., Beuel, S., Intven, M.P., et al.: SynthRAD2025 Grand Challenge dataset: Generating synthetic CTs for radiotherapy from head to abdomen. Medical Physics52(7), e17981 (2025)

work page 2025
[13]

Radiology: Artificial Intelligence 5(5), e230024 (2023)

Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: TotalSegmentator: robust segmen- tation of 104 anatomic structures in CT images. Radiology: Artificial Intelligence 5(5), e230024 (2023)

work page 2023
[14]

Computers in Biology and Medicine178, 108668 (2024)

Zhang, D., Han, Q., Xiong, Y., Du, H.: Mutli-modal straight flow matching for accelerated MR imaging. Computers in Biology and Medicine178, 108668 (2024)

work page 2024
[15]

Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)

Zhang, Y., Yu, P., Zhu, Y., Chang, Y., Gao, F., Wu, Y.N., Leong, O.: Flow priors for linear inverse problems via iterative corrupted trajectory matching. Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)

work page 2024

[1] [1]

International Journal of Computer Vision pp

Bogensperger, L., Narnhofer, D., Falk, A., Schindler, K., Pock, T.: FlowSDF: Flow matching for medical image segmentation using distance transforms. International Journal of Computer Vision pp. 1–13 (2025)

work page 2025

[2] [2]

IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

Dorjsembe, Z., Pao, H.K., Odonchimed, S., Xiao, F.: Conditional diffusion models for semantic 3D brain MRI synthesis. IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

work page 2024

[3] [3]

IEEE Journal of Biomedical and Health Informatics (2025)

Hadzic, A., Bogensperger, L., Berghold, A., Urschler, M.: Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization. IEEE Journal of Biomedical and Health Informatics (2025)

work page 2025

[4] [4]

In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI)

Hadzic, A., Bogensperger, L., Joham, S.J., Urschler, M.: Synthetic Augmentation for Anatomical Landmark Localization Using DDPMs. In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI). pp. 1–12. Springer (2024)

work page 2024

[5] [5]

Medical Image Analysis97, 103276 (2024)

Huijben, E.M., Terpstra, M.L., Pai, S., Thummerer, A., Koopmans, P., Afonso, M., Van Eijnatten, M., Gurney-Champion, O., Chen, Z., Zhang, Y., et al.: Gener- ating synthetic computed tomography for radiotherapy: SynthRAD2023 challenge report. Medical Image Analysis97, 103276 (2024)

work page 2024

[6] [6]

Nature Methods18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods18(2), 203–211 (2021)

work page 2021

[7] [7]

In: International Conference on Learning Representations (ICLR) (2023)

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: International Conference on Learning Representations (ICLR) (2023)

work page 2023

[8] [8]

In: International Conference on Learning Representations (ICLR) (2023)

Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: International Conference on Learning Representations (ICLR) (2023)

work page 2023

[9] [9]

Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis 7

Müller-Franzes, G., Niehues, J.M., Khader, F., Arasteh, S.T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nolte, T., Nebelung, S., et al.: A multimodal compar- ison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT ...

work page 2023

[10] [10]

In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics

Neff, T., Payer, C., Štern, D., Urschler, M.: Generative adversarial network based synthesis for supervised medical image segmentation. In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics. pp. 140–145 (2017)

work page 2017

[11] [11]

In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI)

Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D.: Medical image synthesis with context-aware generative adversarial networks. In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI). pp. 417–425. Springer (2017)

work page 2017

[12] [12]

Medical Physics52(7), e17981 (2025)

Thummerer, A., van der Bijl, E., Galapon, A.J., Kamp, F., Savenije, M., Muijs, C., Aluwini, S., Steenbakkers, R.J., Beuel, S., Intven, M.P., et al.: SynthRAD2025 Grand Challenge dataset: Generating synthetic CTs for radiotherapy from head to abdomen. Medical Physics52(7), e17981 (2025)

work page 2025

[13] [13]

Radiology: Artificial Intelligence 5(5), e230024 (2023)

Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: TotalSegmentator: robust segmen- tation of 104 anatomic structures in CT images. Radiology: Artificial Intelligence 5(5), e230024 (2023)

work page 2023

[14] [14]

Computers in Biology and Medicine178, 108668 (2024)

Zhang, D., Han, Q., Xiong, Y., Du, H.: Mutli-modal straight flow matching for accelerated MR imaging. Computers in Biology and Medicine178, 108668 (2024)

work page 2024

[15] [15]

Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)

Zhang, Y., Yu, P., Zhu, Y., Chang, Y., Gao, F., Wu, Y.N., Leong, O.: Flow priors for linear inverse problems via iterative corrupted trajectory matching. Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)

work page 2024