pith. sign in

arxiv: 2606.00630 · v1 · pith:JFXQQR23new · submitted 2026-05-30 · 💻 cs.CV · stat.ML

A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery

Pith reviewed 2026-06-28 19:01 UTC · model grok-4.3

classification 💻 cs.CV stat.ML
keywords intraoperative ultrasoundMRI synthesisbrain tumor surgeryimage-to-image translationGANdiffusion modelsegmentation utilityperceptual metrics
0
0 comments X

The pith

No single model wins all metrics in ultrasound-to-MRI synthesis, but perceptual quality tracks downstream tumor segmentation utility while SSIM does not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks six generators for turning intraoperative ultrasound into MRI-like images so that existing MRI-based planning and segmentation tools can be used during brain tumor surgery. It runs 48 controlled experiments on the ReMIND dataset under varied inference regimes and targets, measuring both standard image metrics and actual segmentation performance on the outputs. No architecture leads on every axis. Perceptual similarity measured by LPIPS correlates strongly with better downstream segmentation, whereas higher SSIM scores correlate with worse segmentation performance. SynDiff in 2.5D mode achieves the highest utility Dice score of 0.55.

Core claim

Across six generators trained under four inference regimes and two targets on 76 patients, no architecture dominated every evaluation axis; perceptual quality tracked downstream utility most closely with LPIPS showing r=-0.66 against segmentation Dice while SSIM showed r=-0.64 in the opposite direction, and SynDiff-2.5D reached the highest U_Dice of 0.55 on tumor and resection cavity segmentation.

What carries the argument

Systematic multi-regime benchmark of generators (Pix2Pix, SwinPix2Pix, CycleGAN, CUT, ResViT, SynDiff) paired with nnU-Net v2 segmentation as the downstream utility measure on paired ioUS/MRI data.

If this is right

  • Perceptual and downstream-task metrics should be reported alongside or instead of global SSIM for synthesis evaluation.
  • Architecture selection for synthesis should be conditioned on surgical phase, patient history, and specific clinical objective.
  • The 2.5D regime with SynDiff preserves segmentation utility better than the other tested combinations.
  • Subgroup performance by histological grade and reoperation status provides guidance for targeted deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Synthesis models could be trained with explicit perceptual losses to improve downstream clinical utility rather than optimizing pixel-wise or structural metrics alone.
  • The negative SSIM-utility link suggests that high-SSIM outputs may be overly smoothed and lose the fine details needed for accurate tumor boundary segmentation.
  • The same multi-axis protocol could be applied to other intraoperative-to-preoperative translation tasks to check whether perceptual metrics reliably predict task performance beyond this dataset.

Load-bearing premise

The 60/16 patient-level split on the ReMIND dataset represents real-world variability in histological grade and reoperation cases, and nnU-Net v2 segmentation performance serves as a valid proxy for clinical utility.

What would settle it

A follow-up experiment on a larger held-out cohort where models with the highest SSIM scores produce the highest downstream Dice scores would falsify the reported negative correlation between SSIM and utility.

Figures

Figures reproduced from arXiv: 2606.00630 by Ignacio Arrese, Olga Esteban-Sinovas, Rosario Sarabia, Santiago Cepeda.

Figure 1
Figure 1. Figure 1: Overview of the ioUS → MR synthesis benchmark. The pre-processing pipeline (DICOM → NIfTI conversion, ImFusion-LC2 rigid co-registration, resampling of ioUS to the MR grid, FOV cropping to the ioUS volume and intensity normalisation) feeds three architectural families (GAN baselines, ResViT, SynDiff). Each architecture is instantiated under four inference regimes (2D, 2.5D, 2D + 3D-refine, full-3D) and two… view at source ↗
Figure 2
Figure 2. Figure 2: Architectures of the six generators, shown in their 2D / 2.5D variant. (A) GAN baselines: Pix2Pix, SwinPix2Pix and CUT share an attention-gated U-Net generator (the Swin stem is used by SwinPix2Pix only, the PatchNCE heads by CUT only), while CycleGAN uses a 9-block ResNet generator; all use a multi-scale PatchGAN discriminator. (B) ResViT, with the transformer branch on ART blocks 4–5. (C) SynDiff in the … view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative example of ioUS→MRI synthesis on a representative held-out test case (ReMIND￾091; WHO grade 3 astrocytoma, IDH-mutant, no reoperation). Three acquisition settings are shown side by side: preoperative T2w (left), preoperative FLAIR (centre) and postoperative T2w (right). For each setting, the top row gives the input intraoperative ultrasound (Input US) and the acquired reference (Target). Below,… view at source ↗
Figure 4
Figure 4. Figure 4: Synthetic T2w quality on the held-out test set (single-target runs, n = 31 paired studies). (A) Family × inference-regime heat-map of the mean SSIM, PSNR, MAE and LPIPS; warm colours = better SSIM/PSNR, cool colours = better MAE/LPIPS. (B) Forest plot of the mean ± 95 % Student-t CI; colours encode the architectural family, markers encode the inference regime. The 2D + 3D-refinement variants of the GAN fam… view at source ↗
Figure 5
Figure 5. Figure 5: Multi-task synthesis quality on the held-out subset of N = 20 paired studies containing both T2w and FLAIR. (A) Forest plot of synthetic FLAIR quality (mean ± 95 % Student-t CI). Family colour and regime marker follow Fig. 4B. (B) Single-target (T2w only) versus multi-task (T2w + FLAIR) synthetic T2w quality (mean ± 95 % Student-t CI). Adding FLAIR as a second output channel costs at most 0.010 SSIM and 0.… view at source ↗
Figure 6
Figure 6. Figure 6: Pre-resection versus post-resection synthetic T2w metrics for the eight strongest single-target models on the held-out test set (mean ± 95 % Student-t CI; npre = 16, npost = 15). Post-resection studies trade SSIM for PSNR in the GAN family; ResViT and SynDiff follow the inverse pattern. LPIPS is consistently lower (better) on post-resection studies for the transformer and diffusion families. 3.3 Computatio… view at source ↗
Figure 7
Figure 7. Figure 7: Downstream segmentation utility on the held-out test set. (A) Utility ratios (synthesis Dice / real-T2w Dice on the left; synthesis NSD2mm / real-T2w NSD2mm on the right), per-subject mean, for the seven strongest synthesis configurations; bars by class (tumour: blue; cavity: orange); dashed line at 1.0 marks the real-T2w ceiling. Tumour utility ranges 0.61–0.82 (Dice) and 0.56–0.73 (NSD); cavity utility i… view at source ↗
Figure 8
Figure 8. Figure 8: Synthetic T2w quality stratified by histological grade (top row) and by reoperation history (bottom row) for the eight strongest models. Bars are the per-subgroup mean over per-subject means. Grade is metric-neutral; reoperation reduces PSNR for the GAN family and ResViT-full-3D, and LPIPS for the ResViT family, with several individual significant effects (Mann–Whitney U, p < 0.05; see main text). than in … view at source ↗
Figure 9
Figure 9. Figure 9: Downstream tumour and cavity segmentation Dice (single-target source) for the seven strongest models, stratified by histological grade (top) and by reoperation history (bottom). Bars are the per-subgroup mean over per-study Dice. Dotted horizontal lines mark the real-T2w upper bound (nnU-Net trained and evaluated on the real T2w of the same subjects in the same stratum). For most synthesis sources, tumour … view at source ↗
read the original abstract

Intraoperative ultrasound (ioUS) is a versatile, cost-effective modality in brain tumour surgery, but its interpretation is difficult: acquisition planes are non-standard, artefacts are modality-specific, and its appearance differs markedly from the preoperative MRI on which surgical-planning tools, segmentation models and the surgeon's experience rely. Synthesising MRI-like images from ioUS could let this MRI-based infrastructure be reused intraoperatively without an extra scan. Most prior work evaluates a single architecture in isolation; to our knowledge, no benchmark has spanned architectural paradigms, inference regimes and downstream-task endpoints under a common protocol. We address this gap on the public ReMIND data set (76 patients; 153 paired ioUS/T2w and 104 paired ioUS/FLAIR studies; 60/16 patient-level train/held-out split). Six generators (four GAN baselines: Pix2Pix, SwinPix2Pix, CycleGAN, CUT; the transformer-augmented ResViT; and the few-step diffusion model SynDiff) were each trained under four inference regimes (2D, 2.5D, 2D + 3D-refinement, full-3D) and two targets (T2w only; T2w + FLAIR multi-task), yielding 48 experiments. Image-fidelity metrics (SSIM, PSNR, MAE, LPIPS) were complemented by an nnU-Net v2 downstream segmentation evaluation (tumour and resection cavity) and by subgroup analyses by histological grade and reoperation. No architecture dominated every axis, and, critically, perceptual quality tracked downstream utility most closely (LPIPS, r=-0.66, p<0.001), whereas higher SSIM was associated with worse utility (r=-0.64, p<0.001); SynDiff-2.5D best preserved downstream segmentation (U_Dice=0.55). Perceptual and downstream-task metrics should therefore be reported alongside or in preference to global SSIM, and architecture choice conditioned on surgical phase, patient history and clinical objective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents a systematic benchmark of six generators (Pix2Pix, SwinPix2Pix, CycleGAN, CUT, ResViT, SynDiff) for intraoperative ultrasound-to-T2w/FLAIR MRI synthesis on the public ReMIND dataset (76 patients, 60/16 patient-level split). It evaluates 48 configurations across 2D/2.5D/3D regimes and single/multi-task targets using image metrics (SSIM, PSNR, MAE, LPIPS) plus nnU-Net v2 downstream segmentation (tumour/resection cavity Dice) and reports that no architecture dominates all axes, LPIPS correlates most strongly with utility (r=-0.66), SSIM correlates negatively (r=-0.64), and SynDiff-2.5D achieves the highest U_Dice=0.55.

Significance. If the downstream correlations hold, the work provides actionable guidance that perceptual metrics should be prioritised over global SSIM for synthesis tasks whose value is measured by reuse of MRI-based tools in surgery. The scale (48 experiments, multiple paradigms, public data, subgroup analyses) and explicit comparison of fidelity versus task metrics are strengths that could influence evaluation protocols in medical image translation.

major comments (1)
  1. [Abstract] Abstract: the headline correlations (LPIPS r=-0.66, SSIM r=-0.64 with U_Dice) and the recommendation to prefer perceptual metrics rest on nnU-Net v2 segmentation Dice being a faithful proxy for intraoperative clinical utility; no evidence or discussion is supplied that this auto-segmentation task captures surgeon-relevant factors such as artefact interpretation, non-standard plane navigation or real-time resection guidance.
minor comments (1)
  1. [Abstract] Abstract and Methods: the 60/16 patient-level split should include explicit discussion of whether it captures histological-grade and reoperation variability; the current description leaves open whether the held-out set is representative for the claimed generalisability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We respond point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline correlations (LPIPS r=-0.66, SSIM r=-0.64 with U_Dice) and the recommendation to prefer perceptual metrics rest on nnU-Net v2 segmentation Dice being a faithful proxy for intraoperative clinical utility; no evidence or discussion is supplied that this auto-segmentation task captures surgeon-relevant factors such as artefact interpretation, non-standard plane navigation or real-time resection guidance.

    Authors: We agree that nnU-Net v2 Dice is used as a proxy for utility and that the manuscript does not supply direct evidence linking it to every surgeon-relevant factor. The endpoint was selected because tumour and resection-cavity segmentation quantifies preservation of the anatomical information required to reuse MRI-based planning tools—the central motivation for ioUS-to-MRI synthesis. The reported correlations (LPIPS r=-0.66, SSIM r=-0.64) therefore demonstrate that perceptual metrics better predict performance on this specific task. We will revise the abstract and add an explicit limitations paragraph in the discussion stating that the proxy does not capture artefact interpretation, non-standard navigation or real-time guidance, and that surgeon-in-the-loop validation remains necessary. This is a partial revision; the experimental design and quantitative findings are unchanged. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark on held-out data

full rationale

The paper conducts a systematic benchmark by training six generators under multiple regimes on a 60-patient training split of the public ReMIND dataset and evaluating image metrics plus nnU-Net v2 downstream segmentation on a 16-patient held-out set. All reported findings (correlations between LPIPS/SSIM and U_Dice, architecture rankings) are direct statistical summaries of these independent test-set measurements. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the derivation chain; the work contains no first-principles derivations that could reduce to their inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the representativeness of the ReMIND dataset split and the validity of segmentation Dice as a clinical proxy; standard ML training practices are assumed but not detailed.

free parameters (1)
  • Model-specific training hyperparameters for the six generators
    All 48 experiments depend on choices of learning rates, architectures, and optimization settings that are not specified in the abstract.
axioms (1)
  • domain assumption The ReMIND dataset with its 60/16 patient split provides a representative and unbiased test of generalization for brain tumor surgery cases.
    All image-fidelity and downstream results are computed on this specific held-out set.

pith-pipeline@v0.9.1-grok · 5926 in / 1510 out tokens · 41890 ms · 2026-06-28T19:01:52.343356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. What neurosurgeons need to see: synthetic intra-operative MRI from ultrasound for brain-shift compensation in brain tumour surgery

    cs.CV 2026-06 unverdicted novelty 5.0

    End-to-end pipeline uses ResViT-2.5D to synthesize post-resection MRI from ioUS then anchors deformable registration, yielding 5.86 mm TRE on 14 ReMIND subjects while producing an integrated whole-brain volume reflect...

Reference graph

Works this paper leans on

58 extracted references · 48 canonical work pages · cited by 1 Pith paper

  1. [1]

    Multitask weakly supervised generative network for MR-US registration

    Azampour, M.F., Mach, K., Fatemizadeh, E., Demiray, B., Westenfelder, K., Steiger, K., Eiber, M., Wendler, T., Kainz, B., Navab, N., 2024. Multitask weakly supervised generative network for MR-US registration. IEEE Transactions on Medical Imaging 43, 3780–3793. https://doi.org/10.1109/TMI.2024.3400899

  2. [2]

    DiffUS: differentiable ultrasound rendering from volumetric imaging

    Bertramo, N., Duguey, G., Gopalakrishnan, V., 2025. DiffUS: differentiable ultrasound rendering from volumetric imaging. arXiv:2508.06768

  3. [3]

    ResViT: residual vision transformers for multimodal medical image synthesis

    Dalmaz, O., Yurt, M., Çukur, T., 2022. ResViT: residual vision transformers for multimodal medical image synthesis. IEEE Transactions on Medical Imaging 41, 2598–2614.https: //doi.org/10.1109/TMI.2022.3167808

  4. [4]

    Unified brain MR-ultrasound synthesis using multi-modal hierarchical representations

    Dorent, R., Haouchine, N., Kögl, F., Joutard, S., Juvekar, P., Torio, E., Golby, A., Ourselin, S., Frisken, S., Vercauteren, T., Kapur, T., Wells, W.M., 2023. Unified brain MR-ultrasound synthesis using multi-modal hierarchical representations. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2023, LNCS 14229. Springer, pp. 448–458....

  5. [5]

    Patient-specific real-time segmentation in trackerless brain ultrasound

    Dorent, R., Torio, E., Haouchine, N., Galvin, C., Frisken, S., Golby, A., Kapur, T., Wells, W., 2024. Patient-specific real-time segmentation in trackerless brain ultrasound. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2024, LNCS 15006. Springer, pp. 477–487.https://doi.org/10.1007/978-3-031-72089-5_45

  6. [6]

    The Brain Resection Multimodal Image Registration (ReMIND2Reg) 2025 challenge

    Dorent, R., Rigolo, L., Galvin, C.P., Chen, J., Heinrich, M.P., Carass, A., Colliot, O., Wassermann, D., Golby, A., Kapur, T., Wells, W., 2025. The Brain Resection Multimodal Image Registration (ReMIND2Reg) 2025 challenge. arXiv:2508.09649

  7. [7]

    Unified cross-modal medical image synthesis with hierarchical mixture of product-of-experts

    Dorent, R., Haouchine, N., Golby, A., Frisken, S., Kapur, T., Wells, W., 2026. Unified cross-modal medical image synthesis with hierarchical mixture of product-of-experts. IEEE Transactions on Pattern Analysis and Machine Intelligence 48, 1641–1656.https://doi. org/10.1109/TPAMI.2025.3616632

  8. [8]

    Automatic ultrasound-MRI registration for neurosurgery using the 2D and 3D LC2 metric

    Fuerst, B., Wein, W., Müller, M., Navab, N., 2014. Automatic ultrasound-MRI registration for neurosurgery using the 2D and 3D LC2 metric. Medical Image Analysis 18, 1312–1319. https://doi.org/10.1016/j.media.2014.04.008

  9. [9]

    Learn2Reg 2024: new benchmark datasets driving progress on new challenges

    Hansen, L., Heyer, W., Großbröhmer, C., et al., 2025. Learn2Reg 2024: new benchmark datasets driving progress on new challenges. Journal of Machine Learning for Biomedical Imaging (MELBA) 2025:034. arXiv:2509.01217

  10. [10]

    MIND: modality independent neighbourhood descriptor for multi-modal de- formable registration

    Heinrich, M.P., Jenkinson, M., Bhushan, M., Matin, T., Gleeson, F.V., Brady, M., Schnabel, J.A., 2012. MIND: modality independent neighbourhood descriptor for multi-modal de- formable registration. Medical Image Analysis 16, 1423–1435.https://doi.org/10.1016/ j.media.2012.05.008

  11. [11]

    To- 36 wards realtime multimodal fusion for image-guided interventions using self-similarities

    Heinrich, M.P., Jenkinson, M., Papież, B.W., Brady, S.M., Schnabel, J.A., 2013. To- 36 wards realtime multimodal fusion for image-guided interventions using self-similarities. In: MICCAI 2013, LNCS 8149. Springer, pp. 187–194. https://doi.org/10.1007/ 978-3-642-40811-3_24

  12. [12]

    Maximizing safe resection of low- and high- grade glioma

    Hervey-Jumper, S.L., Berger, M.S., 2016. Maximizing safe resection of low- and high- grade glioma. Journal of Neuro-Oncology 130, 269–282. https://doi.org/10.1007/ s11060-016-2110-4

  13. [13]

    Synth- Morph: learning contrast-invariant registration without acquired images

    Hoffmann, M., Billot, B., Greve, D.N., Iglesias, J.E., Fischl, B., Dalca, A.V., 2022. Synth- Morph: learning contrast-invariant registration without acquired images. IEEE Transactions on Medical Imaging 41, 543–558.https://doi.org/10.1109/TMI.2021.3116879

  14. [14]

    Nat Methods18(2), 203–211 (Feb 2021)

    Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H., 2021. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18, 203–211.https://doi.org/10.1038/s41592-020-01008-z

  15. [15]

    In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017

    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A., 2017. Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1125–1134.https://doi.org/10.1109/CVPR.2017.632

  16. [16]

    Cross-modal conditional latent diffusion model for brain MRI to ultrasound image translation

    Jiang, S., Wang, L., Li, Y., Yang, Z., Zhou, Z., Li, B., 2025. Cross-modal conditional latent diffusion model for brain MRI to ultrasound image translation. Physics in Medicine & Biology 70, 155005.https://doi.org/10.1088/1361-6560/adf0bc

  17. [17]

    Anatomy-aware self-supervised fetal MRI synthesis from unpaired ultrasound images

    Jiao, J., Namburete, A.I.L., Papageorghiou, A.T., Noble, J.A., 2019. Anatomy-aware self-supervised fetal MRI synthesis from unpaired ultrasound images. In: Machine Learning in Medical Imaging (MLMI 2019), LNCS 11861. Springer, pp. 178–186.https://doi.org/ 10.1007/978-3-030-32692-0_21

  18. [18]

    Self-supervised ultrasound to MRI fetal brain image synthesis

    Jiao, J., Namburete, A.I.L., Papageorghiou, A.T., Noble, J.A., 2020. Self-supervised ultrasound to MRI fetal brain image synthesis. IEEE Transactions on Medical Imaging 39, 4413–4424.https://doi.org/10.1109/TMI.2020.3018560

  19. [19]

    ReMIND: the brain resection multimodal imaging database

    Juvekar, P., Dorent, R., Kögl, F., Torio, E., Barr, C., Rigolo, L., Galvin, C., Jowkar, N., Kazi, A., Haouchine, N., Cheema, H., Navab, N., Pieper, S., Wells, W.M., Bi, W.L., Golby, A., Frisken, S., Kapur, T., 2024. ReMIND: the brain resection multimodal imaging database. Scientific Data 11, 494.https://doi.org/10.1038/s41597-024-03295-z

  20. [20]

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

    Kendall, A., Gal, Y., Cipolla, R., 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7482–7491.https://doi.org/10.1109/CVPR.2018.00781

  21. [21]

    Registration of 3D fetal neurosonography and MRI

    Kuklisova-Murgasova, M., Cifor, A., Napolitano, R., Papageorghiou, A., Quaghebeur, G., Rutherford, M.A., Hajnal, J.V., Noble, J.A., Schnabel, J.A., 2013. Registration of 3D fetal neurosonography and MRI. Medical Image Analysis 17, 1137–1150. https: //doi.org/10.1016/j.media.2013.07.004

  22. [22]

    Two-step latent 37 diffusion modelling for morphology-guided synthesis of glioma intraoperative ultrasound images

    Lasala, A., Fiorentino, M.C., Bandini, A., Moccia, S., Giannarou, S., 2026. Two-step latent 37 diffusion modelling for morphology-guided synthesis of glioma intraoperative ultrasound images. Biomedical Signal Processing and Control 120, 110037. https://doi.org/10. 1016/j.bspc.2026.110037

  23. [23]

    Brain tumor segmentation via cross-modality semi-supervised transfer learning with 3D MRI diffusion model synthetic ultrasound

    Li, Y., Jiang, S., Yang, Z., Wang, L., Wang, S., Zhou, Z., 2026. Brain tumor segmentation via cross-modality semi-supervised transfer learning with 3D MRI diffusion model synthetic ultrasound. Information Fusion 127, 103757.https://doi.org/10.1016/j.inffus.2025. 103757

  24. [24]

    Machado, I., Toews, M., George, E., Unadkat, P., Essayed, W., Luo, J., Teodoro, P., Carvalho, H., Martins, J., Golland, P., Pieper, S., Frisken, S., Golby, A., Wells, W., Ou, Y.,

  25. [25]

    NeuroImage 202, 116094.https://doi.org/10.1016/j.neuroimage.2019.116094

    Deformable MRI-ultrasound registration using correlation-based attribute matching for brain shift correction: accuracy and generality in multi-site data. NeuroImage 202, 116094.https://doi.org/10.1016/j.neuroimage.2019.116094

  26. [26]

    Metrics Reloaded: recommendations for image analysis validation

    Maier-Hein, L., Reinke, A., Godau, P., et al., 2024. Metrics Reloaded: recommendations for image analysis validation. Nature Methods 21, 195–212.https://doi.org/10.1038/ s41592-023-02151-z

  27. [27]

    Least squares generative adversarial networks

    Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P., 2017. Least squares generative adversarial networks. In: IEEE International Conference on Computer Vision (ICCV). pp. 2794–2802.https://doi.org/10.1109/ICCV.2017.304

  28. [28]

    Online database of clinical MR and ultrasound images of brain tumors

    Mercier, L., Del Maestro, R.F., Petrecca, K., Araujo, D., Haegelen, C., Collins, D.L., 2012. Online database of clinical MR and ultrasound images of brain tumors. Medical Physics 39, 3253–3261.https://doi.org/10.1118/1.4709600

  29. [29]

    Fast free-form deformation using graphics processing units

    Modat, M., Ridgway, G.R., Taylor, Z.A., Lehmann, M., Barnes, J., Hawkes, D.J., Fox, N.C., Ourselin, S., 2010. Fast free-form deformation using graphics processing units. Computer Methods and Programs in Biomedicine 98, 278–284.https://doi.org/10.1016/j.cmpb. 2009.09.002

  30. [30]

    A 3D cross-modal keypoint descriptor for MR-US matching and registration

    Morozov, D., Dorent, R., Haouchine, N., 2025. A 3D cross-modal keypoint descriptor for MR-US matching and registration. arXiv:2507.18551

  31. [31]

    Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study

    Nikolov, S., Blackwell, S., Zverovitch, A., et al., 2021. Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study. Journal of Medical Internet Research 23, e26151.https://doi.org/10. 2196/26151

  32. [32]

    Özbey, M., Dalmaz, O., Dar, S.U.H., Bedel, H.A., Öztürk, Ş., Güngör, A., Çukur, T.,

  33. [33]

    IEEE Transactions on Medical Imaging 42, 3524–3539.https://doi.org/10.1109/TMI.2023

    Unsupervised medical image translation with adversarial diffusion models. IEEE Transactions on Medical Imaging 42, 3524–3539.https://doi.org/10.1109/TMI.2023. 3290149

  34. [34]

    Contrastive learning for unpaired 38 image-to-image translation

    Park, T., Efros, A.A., Zhang, R., Zhu, J.Y., 2020. Contrastive learning for unpaired 38 image-to-image translation. In: European Conference on Computer Vision (ECCV), LNCS 12354. Springer, pp. 319–345.https://doi.org/10.1007/978-3-030-58545-7_19

  35. [35]

    Rahmani, M., Moghaddasi, H., Pour-Rashidi, A., Ahmadian, A., Najafzadeh, E., Farnia, P.,

  36. [36]

    Diagnostics 14, 1319

    D2BGAN: dual discriminator Bayesian generative adversarial network for deformable MR-ultrasound registration applied to brain shift compensation. Diagnostics 14, 1319. https://doi.org/10.3390/diagnostics14131319

  37. [37]

    Brainshift correction using navigated intraoperative ultrasound informs intraoperative decision-making during glioma surgery

    Rai, A., Singh, V., Shetty, P., Moiyadi, A.V., 2025. Brainshift correction using navigated intraoperative ultrasound informs intraoperative decision-making during glioma surgery. Acta Neurochirurgica 167, 124.https://doi.org/10.1007/s00701-025-06457-z

  38. [38]

    Learning to match 2D keypoints across preoperative MR and intraoperative ultrasound

    Rasheed, H., Dorent, R., Fehrentz, M., Kapur, T., Wells, W.M., Golby, A., Frisken, S., Navab, N., Haouchine, N., 2024. Learning to match 2D keypoints across preoperative MR and intraoperative ultrasound. In: Simplifying Medical Ultrasound (ASMUS 2024, MICCAI Workshop), LNCS 15186. Springer, pp. 78–87. https://doi.org/10.1007/ 978-3-031-73647-6_8

  39. [39]

    Influence of high-performance image-to-image translation networks on clinical visual assessment and outcome prediction: utilizing ultrasound to MRI translation in prostate cancer

    Salmanpour, M.R., Mousavi, A., Xu, Y., Weeks, W.B., Hacihaliloglu, I., 2026. Influence of high-performance image-to-image translation networks on clinical visual assessment and outcome prediction: utilizing ultrasound to MRI translation in prostate cancer. International Journal of Computer Assisted Radiology and Surgery 21, 125–135.https://doi.org/10. 100...

  40. [40]

    Glioma extent of resection and its impact on patient outcome

    Sanai, N., Berger, M.S., 2008. Glioma extent of resection and its impact on patient outcome. Neurosurgery 62, 753–764.https://doi.org/10.1227/01.neu.0000318159.21731.cf

  41. [41]

    Navigated ultrasound-based image guidance during resection of gliomas: practical utility in intraoperative decision-making and outcomes

    Shetty, P., Yeole, U., Singh, V., Moiyadi, A., 2021. Navigated ultrasound-based image guidance during resection of gliomas: practical utility in intraoperative decision-making and outcomes. Neurosurgical Focus 50, E14.https://doi.org/10.3171/2020.10.FOCUS20550

  42. [42]

    ConvexAdam: self- configuring dual-optimization-based 3D multitask medical image registration

    Siebert, H., Großbröhmer, C., Hansen, L., Heinrich, M.P., 2025. ConvexAdam: self- configuring dual-optimization-based 3D multitask medical image registration. IEEE Trans- actions on Medical Imaging 44, 738–748.https://doi.org/10.1109/TMI.2024.3462248

  43. [43]

    Translation of fetal brain ultrasound images into pseudo-MRI images using artificial intelligence

    Silverstein, N., Beloosesky, R., Leibowitz, E., Azhari, H., 2025. Translation of fetal brain ultrasound images into pseudo-MRI images using artificial intelligence. arXiv:2504.02408

  44. [44]

    BrainVoxGen: deep learning framework for synthesis of ultrasound to MRI

    Singh, S., Bewoor, M., Ranapurwala, A., Rai, S., Patil, S., 2023. BrainVoxGen: deep learning framework for synthesis of ultrasound to MRI. arXiv:2310.08608

  45. [45]

    Ability of navigated 3D ultrasound to delineate gliomas and metastases: comparison of image interpretations with histopathology

    Unsgård, G., Selbekk, T., Brostrup Müller, T., Ommedal, S., Torp, S.H., Myhr, G., Bang, J., Nagelhus Hernes, T.A., 2005. Ability of navigated 3D ultrasound to delineate gliomas and metastases: comparison of image interpretations with histopathology. Acta Neurochirurgica 147, 1259–1269.https://doi.org/10.1007/s00701-005-0624-1

  46. [46]

    Enhancing pix2pix with Swin 39 Transformer for cross-modal brain CT-MR synthesis

    Verdicchio, M., Isgrò, F., Salvatore, M., Aiello, M., 2025. Enhancing pix2pix with Swin 39 Transformer for cross-modal brain CT-MR synthesis. Research Square preprint rs.3.rs- 7565545/v1.https://doi.org/10.21203/rs.3.rs-7565545/v1

  47. [47]

    IEEE Transactions on Image Processing 13(4), 600–612 (Apr 2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 600–612.https://doi.org/10.1109/TIP.2003.819861

  48. [48]

    High-resolution image synthesis and semantic manipulation with conditional GANs

    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B., 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8798–8807.https://doi.org/ 10.1109/CVPR.2018.00917

  49. [49]

    Unsupervised multimodal 3D med- ical image registration with multilevel correlation balanced optimization

    Wang, J., Chen, X., Zhang, Y., Liu, M., Zhang, H., 2024. Unsupervised multimodal 3D med- ical image registration with multilevel correlation balanced optimization. arXiv:2409.05040. Learn2Reg 2024 challenge submission

  50. [50]

    Coarse-to-fine joint registration of MR and ultrasound images via imaging style transfer

    Wang, J., Zhang, Y., Liu, M., Chen, X., Wang, Y., Zhang, H., 2025. Coarse-to-fine joint registration of MR and ultrasound images via imaging style transfer. arXiv:2508.05240. ReMIND2Reg 2024 challenge submission

  51. [51]

    Unsupervised MR-US multimodal image registration with multilevel correlation pyramidal optimization

    Wang, J., Liu, Z., Liu, M., Chen, X., Yu, X., Wang, Y., Zhang, H., 2026. Unsupervised MR-US multimodal image registration with multilevel correlation pyramidal optimization. arXiv:2602.06288

  52. [52]

    Globalregistration of ultrasound to MRI using the LC2 metric for enabling neurosurgical guidance

    Wein, W., Ladikos, A., Fuerst, B., Shah, A., Sharma, K., Navab, N., 2013. Globalregistration of ultrasound to MRI using the LC2 metric for enabling neurosurgical guidance. In: MICCAI 2013, LNCS 7908. Springer, pp. 34–41.https://doi.org/10.1007/978-3-642-40811-3_ 5

  53. [53]

    REtroSpective Evaluation of Cerebral Tumors (RESECT): a clinical database of pre-operative MRI and intra-operative ultrasound in low-grade glioma surgeries

    Xiao, Y., Fortin, M., Unsgård, G., Rivaz, H., Reinertsen, I., 2017. REtroSpective Evaluation of Cerebral Tumors (RESECT): a clinical database of pre-operative MRI and intra-operative ultrasound in low-grade glioma surgeries. Medical Physics 44, 3875–3882.https://doi. org/10.1118/1.4986620

  54. [54]

    Evaluation of MRI to ultrasound registration methods for brain shift correction: the CuRIOUS2018 challenge

    Xiao, Y., Rivaz, H., Chabanas, M., Fortin, M., Machado, I., Ou, Y., Heinrich, M.P., Schnabel, J.A., Zhong, X., Maier, A., Wein, W., Shams, R., Kadoury, S., Drobny, D., Modat, M., Reinertsen, I., 2020. Evaluation of MRI to ultrasound registration methods for brain shift correction: the CuRIOUS2018 challenge. IEEE Transactions on Medical Imaging 39, 777–786...

  55. [55]

    Tackling the generative learning trilemma with denoising diffusion GANs

    Xiao, Z., Kreis, K., Vahdat, A., 2022. Tackling the generative learning trilemma with denoising diffusion GANs. In: International Conference on Learning Representations (ICLR). arXiv:2112.07804

  56. [56]

    Towards automated correction of brain shift using deep deformable MRI-ioUS 40 registration

    Zeineldin, R.A., Karar, M.E., Coburger, J., Wirtz, C.R., Mathis-Ullrich, F., Burgert, O., 2020. Towards automated correction of brain shift using deep deformable MRI-ioUS 40 registration. Current Directions in Biomedical Engineering 6, 20200039.https://doi.org/ 10.1515/cdbme-2020-0039

  57. [57]

    Deep Residual Learning for Image Recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O., 2018. The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 586–595.https://doi.org/10.1109/CVPR. 2018.00068

  58. [58]

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    Zhu, J.Y., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV). pp. 2223–2232.https://doi.org/10.1109/ICCV.2017.244. 41