Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis
Pith reviewed 2026-05-18 10:10 UTC · model grok-4.3
The pith
Flow matching generates synthetic CT from MRI or CBCT by integrating a learned velocity field conditioned on encoder features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to a
What carries the argument
The learned flow matching velocity field that transforms Gaussian noise into synthetic CT, conditioned on features from a lightweight 3D encoder applied to the input MRI or CBCT volume.
If this is right
- Enables MRI-only radiotherapy workflows by supplying synthetic CT without extra X-ray exposure.
- Supports CBCT-based adaptive radiotherapy with maintained global anatomical accuracy.
- Separate per-region models allow targeted performance for abdomen, head-and-neck, and thorax.
- Low training resolution is identified as the main bottleneck for local structural fidelity.
Where Pith is reading between the lines
- Patch-based or latent-space flow matching could raise effective resolution without exceeding current memory limits.
- The same conditioning pattern may transfer to other cross-modality medical synthesis problems.
- Clinical integration would require testing whether global accuracy alone suffices for dose-calculation accuracy.
- Higher-resolution variants would allow direct comparison of detail preservation against the current baseline.
Load-bearing premise
Features extracted by the lightweight 3D encoder from the input MRI or CBCT volume are sufficient to condition the flow matching velocity field for accurate synthesis across anatomical regions.
What would settle it
A controlled experiment that raises training resolution or switches to patch-based training while keeping the same encoder conditioning, then measures whether fine-detail metrics such as small-vessel or trabecular-bone fidelity improve or stay flat.
Figures
read the original abstract
Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a fully 3D Flow Matching (FM) framework for conditional synthesis of CT images from MRI or CBCT. A lightweight 3D encoder extracts features from the input to condition the FM velocity field, which transforms Gaussian noise into the synthetic CT. Separate models are trained for MRI-to-sCT and CBCT-to-sCT across abdomen, head/neck, and thorax regions, evaluated on the SynthRAD2025 Challenge benchmark. The authors report accurate reconstruction of global anatomical structures with limitations in fine details attributed to low training resolution due to computational constraints.
Significance. If supported by quantitative evidence, the approach could contribute to MRI-only and CBCT-based adaptive radiotherapy by offering an efficient generative model for image synthesis that reduces radiation exposure. The motivation from recent FM work for high-quality image generation is a reasonable starting point, and the use of a challenge benchmark provides a standardized evaluation setting.
major comments (2)
- [Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.
- [Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.
minor comments (1)
- [Abstract] Abstract: The statement that 'validation and testing were performed through the challenge submission system' should be expanded with the specific challenge metrics and any available leaderboard context to allow readers to interpret the reported qualitative success.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires quantitative support for its claims and a more balanced discussion of potential contributing factors to the observed limitations. We respond to each major comment below and will incorporate revisions in the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.
Authors: We agree that the abstract should provide quantitative backing for the central claim. The full manuscript reports MAE, SSIM, PSNR, and Dice scores (with standard deviations) on the SynthRAD2025 benchmark in Section 4 and the associated tables for each anatomical region and modality. We will revise the abstract to include representative metrics (e.g., mean MAE and SSIM across regions) drawn from the challenge submissions to directly support the statement about global anatomical reconstruction. revision: yes
-
Referee: [Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.
Authors: We acknowledge that the current text attributes the limitation primarily to resolution constraints arising from 3D memory limits. We did not conduct receptive-field analysis, encoder-depth ablations, or conditioning-strength comparisons in this work. We will revise the abstract and discussion to note that both low resolution and possible limitations in the lightweight encoder's feature extraction may contribute to fine-detail loss. This will be presented as an open question for future work rather than an exclusive attribution. revision: partial
Circularity Check
No significant circularity; standard supervised conditional generative modeling
full rationale
The paper presents a standard application of flow matching for conditional image synthesis, where a velocity field is learned via supervised training to map noise to sCT conditioned on features from a lightweight 3D encoder, with evaluation on the external SynthRAD2025 benchmark. No load-bearing step reduces by construction to its own inputs: the method description does not define the velocity field or conditioning features in terms of the target synthesis outputs, nor does it rename fitted parameters as independent predictions. Claims about global anatomical reconstruction and fine-detail limitations are tied directly to empirical results and training constraints rather than self-referential derivations or self-citation chains. The approach remains self-contained against external benchmarks without invoking uniqueness theorems or ansatzes from overlapping prior work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Flow matching can learn a velocity field that transports Gaussian noise to the target data distribution when conditioned on input features.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt a fully 3D Flow Matching (FM) framework... across abdomen, head and neck, and thorax
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
International Journal of Computer Vision pp
Bogensperger, L., Narnhofer, D., Falk, A., Schindler, K., Pock, T.: FlowSDF: Flow matching for medical image segmentation using distance transforms. International Journal of Computer Vision pp. 1–13 (2025)
work page 2025
-
[2]
IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)
Dorjsembe, Z., Pao, H.K., Odonchimed, S., Xiao, F.: Conditional diffusion models for semantic 3D brain MRI synthesis. IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)
work page 2024
-
[3]
IEEE Journal of Biomedical and Health Informatics (2025)
Hadzic, A., Bogensperger, L., Berghold, A., Urschler, M.: Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization. IEEE Journal of Biomedical and Health Informatics (2025)
work page 2025
-
[4]
In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI)
Hadzic, A., Bogensperger, L., Joham, S.J., Urschler, M.: Synthetic Augmentation for Anatomical Landmark Localization Using DDPMs. In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI). pp. 1–12. Springer (2024)
work page 2024
-
[5]
Medical Image Analysis97, 103276 (2024)
Huijben, E.M., Terpstra, M.L., Pai, S., Thummerer, A., Koopmans, P., Afonso, M., Van Eijnatten, M., Gurney-Champion, O., Chen, Z., Zhang, Y., et al.: Gener- ating synthetic computed tomography for radiotherapy: SynthRAD2023 challenge report. Medical Image Analysis97, 103276 (2024)
work page 2024
-
[6]
Nature Methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods18(2), 203–211 (2021)
work page 2021
-
[7]
In: International Conference on Learning Representations (ICLR) (2023)
Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: International Conference on Learning Representations (ICLR) (2023)
work page 2023
-
[8]
In: International Conference on Learning Representations (ICLR) (2023)
Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: International Conference on Learning Representations (ICLR) (2023)
work page 2023
-
[9]
Müller-Franzes, G., Niehues, J.M., Khader, F., Arasteh, S.T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nolte, T., Nebelung, S., et al.: A multimodal compar- ison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT ...
work page 2023
-
[10]
In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics
Neff, T., Payer, C., Štern, D., Urschler, M.: Generative adversarial network based synthesis for supervised medical image segmentation. In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics. pp. 140–145 (2017)
work page 2017
-
[11]
Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D.: Medical image synthesis with context-aware generative adversarial networks. In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI). pp. 417–425. Springer (2017)
work page 2017
-
[12]
Medical Physics52(7), e17981 (2025)
Thummerer, A., van der Bijl, E., Galapon, A.J., Kamp, F., Savenije, M., Muijs, C., Aluwini, S., Steenbakkers, R.J., Beuel, S., Intven, M.P., et al.: SynthRAD2025 Grand Challenge dataset: Generating synthetic CTs for radiotherapy from head to abdomen. Medical Physics52(7), e17981 (2025)
work page 2025
-
[13]
Radiology: Artificial Intelligence 5(5), e230024 (2023)
Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: TotalSegmentator: robust segmen- tation of 104 anatomic structures in CT images. Radiology: Artificial Intelligence 5(5), e230024 (2023)
work page 2023
-
[14]
Computers in Biology and Medicine178, 108668 (2024)
Zhang, D., Han, Q., Xiong, Y., Du, H.: Mutli-modal straight flow matching for accelerated MR imaging. Computers in Biology and Medicine178, 108668 (2024)
work page 2024
-
[15]
Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)
Zhang, Y., Yu, P., Zhu, Y., Chang, Y., Gao, F., Wu, Y.N., Leong, O.: Flow priors for linear inverse problems via iterative corrupted trajectory matching. Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.