pith. sign in

arxiv: 2510.04823 · v2 · submitted 2025-10-06 · 💻 cs.CV

Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Pith reviewed 2026-05-18 10:10 UTC · model grok-4.3

classification 💻 cs.CV
keywords flow matchingsynthetic CTMRI to CTCBCT to CTimage synthesisradiotherapy3D generative modelconditional image generation
0
0 comments X

The pith

Flow matching generates synthetic CT from MRI or CBCT by integrating a learned velocity field conditioned on encoder features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a fully 3D flow matching framework to synthesize CT images from MRI and CBCT scans for radiotherapy planning. A volume of Gaussian noise is transformed into a synthetic CT by integrating a velocity field that is conditioned on features extracted from the input scan via a lightweight 3D encoder. Separate models were trained and tested for MRI-to-sCT and CBCT-to-sCT tasks on the SynthRAD2025 benchmark across abdomen, head-and-neck, and thorax regions. The approach reconstructs overall anatomical layouts with good fidelity but shows limited preservation of small structures, mainly because training resolution was kept low to fit memory and runtime limits.

Core claim

We adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to a

What carries the argument

The learned flow matching velocity field that transforms Gaussian noise into synthetic CT, conditioned on features from a lightweight 3D encoder applied to the input MRI or CBCT volume.

If this is right

  • Enables MRI-only radiotherapy workflows by supplying synthetic CT without extra X-ray exposure.
  • Supports CBCT-based adaptive radiotherapy with maintained global anatomical accuracy.
  • Separate per-region models allow targeted performance for abdomen, head-and-neck, and thorax.
  • Low training resolution is identified as the main bottleneck for local structural fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Patch-based or latent-space flow matching could raise effective resolution without exceeding current memory limits.
  • The same conditioning pattern may transfer to other cross-modality medical synthesis problems.
  • Clinical integration would require testing whether global accuracy alone suffices for dose-calculation accuracy.
  • Higher-resolution variants would allow direct comparison of detail preservation against the current baseline.

Load-bearing premise

Features extracted by the lightweight 3D encoder from the input MRI or CBCT volume are sufficient to condition the flow matching velocity field for accurate synthesis across anatomical regions.

What would settle it

A controlled experiment that raises training resolution or switches to patch-based training while keeping the same encoder conditioning, then measures whether fine-detail metrics such as small-vessel or trabecular-bone fidelity improve or stay flat.

Figures

Figures reproduced from arXiv: 2510.04823 by Arnela Hadzic, Martin Urschler, Simon Johannes Joham.

Figure 1
Figure 1. Figure 1: Examples of synthetic CT images generated by our MRI → sCT (left column) and CBCT → sCT models (right column) for different anatomical regions. The results are reported using the metrics described in Section 3.2, with both the mean and standard deviation (mean ± std) for each metric. Qualitative results are shown in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a fully 3D Flow Matching (FM) framework for conditional synthesis of CT images from MRI or CBCT. A lightweight 3D encoder extracts features from the input to condition the FM velocity field, which transforms Gaussian noise into the synthetic CT. Separate models are trained for MRI-to-sCT and CBCT-to-sCT across abdomen, head/neck, and thorax regions, evaluated on the SynthRAD2025 Challenge benchmark. The authors report accurate reconstruction of global anatomical structures with limitations in fine details attributed to low training resolution due to computational constraints.

Significance. If supported by quantitative evidence, the approach could contribute to MRI-only and CBCT-based adaptive radiotherapy by offering an efficient generative model for image synthesis that reduces radiation exposure. The motivation from recent FM work for high-quality image generation is a reasonable starting point, and the use of a challenge benchmark provides a standardized evaluation setting.

major comments (2)
  1. [Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.
  2. [Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.
minor comments (1)
  1. [Abstract] Abstract: The statement that 'validation and testing were performed through the challenge submission system' should be expanded with the specific challenge metrics and any available leaderboard context to allow readers to interpret the reported qualitative success.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires quantitative support for its claims and a more balanced discussion of potential contributing factors to the observed limitations. We respond to each major comment below and will incorporate revisions in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the method 'accurately reconstructs global anatomical structures' is presented without any quantitative metrics (e.g., MAE, SSIM, or Dice scores), error bars, ablation studies, or baseline comparisons on the SynthRAD2025 benchmark. This absence directly undermines verification of the central accuracy claim and leaves the soundness assessment at the reported low level.

    Authors: We agree that the abstract should provide quantitative backing for the central claim. The full manuscript reports MAE, SSIM, PSNR, and Dice scores (with standard deviations) on the SynthRAD2025 benchmark in Section 4 and the associated tables for each anatomical region and modality. We will revise the abstract to include representative metrics (e.g., mean MAE and SSIM across regions) drawn from the challenge submissions to directly support the statement about global anatomical reconstruction. revision: yes

  2. Referee: [Abstract] Abstract: The attribution of fine-detail loss exclusively to 'relatively low training resolution imposed by memory and runtime constraints' does not address whether features from the lightweight 3D encoder are sufficient to condition the velocity field across anatomical regions. No receptive-field analysis, ablation on encoder depth, or comparison of conditioning strength is provided; this assumption is load-bearing for the distinction between resolution and information-loss issues.

    Authors: We acknowledge that the current text attributes the limitation primarily to resolution constraints arising from 3D memory limits. We did not conduct receptive-field analysis, encoder-depth ablations, or conditioning-strength comparisons in this work. We will revise the abstract and discussion to note that both low resolution and possible limitations in the lightweight encoder's feature extraction may contribute to fine-detail loss. This will be presented as an open question for future work rather than an exclusive attribution. revision: partial

Circularity Check

0 steps flagged

No significant circularity; standard supervised conditional generative modeling

full rationale

The paper presents a standard application of flow matching for conditional image synthesis, where a velocity field is learned via supervised training to map noise to sCT conditioned on features from a lightweight 3D encoder, with evaluation on the external SynthRAD2025 benchmark. No load-bearing step reduces by construction to its own inputs: the method description does not define the velocity field or conditioning features in terms of the target synthesis outputs, nor does it rename fitted parameters as independent predictions. Claims about global anatomical reconstruction and fine-detail limitations are tied directly to empirical results and training constraints rather than self-referential derivations or self-citation chains. The approach remains self-contained against external benchmarks without invoking uniqueness theorems or ansatzes from overlapping prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides minimal visibility into modeling assumptions; the central claim rests on standard neural network training and the unstated premise that flow matching velocity integration yields anatomically plausible CT volumes when conditioned appropriately.

axioms (1)
  • domain assumption Flow matching can learn a velocity field that transports Gaussian noise to the target data distribution when conditioned on input features.
    Invoked by the choice of FM framework for conditional image synthesis.

pith-pipeline@v0.9.0 · 5727 in / 1295 out tokens · 30081 ms · 2026-05-18T10:10:43.596362+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    International Journal of Computer Vision pp

    Bogensperger, L., Narnhofer, D., Falk, A., Schindler, K., Pock, T.: FlowSDF: Flow matching for medical image segmentation using distance transforms. International Journal of Computer Vision pp. 1–13 (2025)

  2. [2]

    IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

    Dorjsembe, Z., Pao, H.K., Odonchimed, S., Xiao, F.: Conditional diffusion models for semantic 3D brain MRI synthesis. IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

  3. [3]

    IEEE Journal of Biomedical and Health Informatics (2025)

    Hadzic, A., Bogensperger, L., Berghold, A., Urschler, M.: Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization. IEEE Journal of Biomedical and Health Informatics (2025)

  4. [4]

    In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI)

    Hadzic, A., Bogensperger, L., Joham, S.J., Urschler, M.: Synthetic Augmentation for Anatomical Landmark Localization Using DDPMs. In: International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI). pp. 1–12. Springer (2024)

  5. [5]

    Medical Image Analysis97, 103276 (2024)

    Huijben, E.M., Terpstra, M.L., Pai, S., Thummerer, A., Koopmans, P., Afonso, M., Van Eijnatten, M., Gurney-Champion, O., Chen, Z., Zhang, Y., et al.: Gener- ating synthetic computed tomography for radiotherapy: SynthRAD2023 challenge report. Medical Image Analysis97, 103276 (2024)

  6. [6]

    Nature Methods18(2), 203–211 (2021)

    Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods18(2), 203–211 (2021)

  7. [7]

    In: International Conference on Learning Representations (ICLR) (2023)

    Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: International Conference on Learning Representations (ICLR) (2023)

  8. [8]

    In: International Conference on Learning Representations (ICLR) (2023)

    Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: International Conference on Learning Representations (ICLR) (2023)

  9. [9]

    Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis 7

    Müller-Franzes, G., Niehues, J.M., Khader, F., Arasteh, S.T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nolte, T., Nebelung, S., et al.: A multimodal compar- ison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Scientific Reports13(1), 12098 (2023) Flow Matching for Conditional MRI-CT ...

  10. [10]

    In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics

    Neff, T., Payer, C., Štern, D., Urschler, M.: Generative adversarial network based synthesis for supervised medical image segmentation. In: Proceedings of the OAGM&ARW Joint Workshop 2017: Vision, Automation and Robotics. pp. 140–145 (2017)

  11. [11]

    In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI)

    Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D.: Medical image synthesis with context-aware generative adversarial networks. In: Interna- tional Conference on Medical Image Computing and Computer-Assisted Interven- tion (MICCAI). pp. 417–425. Springer (2017)

  12. [12]

    Medical Physics52(7), e17981 (2025)

    Thummerer, A., van der Bijl, E., Galapon, A.J., Kamp, F., Savenije, M., Muijs, C., Aluwini, S., Steenbakkers, R.J., Beuel, S., Intven, M.P., et al.: SynthRAD2025 Grand Challenge dataset: Generating synthetic CTs for radiotherapy from head to abdomen. Medical Physics52(7), e17981 (2025)

  13. [13]

    Radiology: Artificial Intelligence 5(5), e230024 (2023)

    Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: TotalSegmentator: robust segmen- tation of 104 anatomic structures in CT images. Radiology: Artificial Intelligence 5(5), e230024 (2023)

  14. [14]

    Computers in Biology and Medicine178, 108668 (2024)

    Zhang, D., Han, Q., Xiong, Y., Du, H.: Mutli-modal straight flow matching for accelerated MR imaging. Computers in Biology and Medicine178, 108668 (2024)

  15. [15]

    Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)

    Zhang, Y., Yu, P., Zhu, Y., Chang, Y., Gao, F., Wu, Y.N., Leong, O.: Flow priors for linear inverse problems via iterative corrupted trajectory matching. Advances in Neural Information Processing Systems (NeurIPS)37, 57389–57417 (2024)