3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance

Amulya Kumar Mahto; Sapna Sachan

arxiv: 2605.04856 · v1 · submitted 2026-05-06 · 💻 cs.CV

3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance

Sapna Sachan , Amulya Kumar Mahto This is my paper

Pith reviewed 2026-05-08 16:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords pseudo-CT synthesis3D ultrasoundtransformer bottleneckresidual U-Netadversarial trainingimage-to-image translationoperator guidancekidney imaging

0 comments

The pith

A transformer-augmented residual network generates CT-like volumes from 3D ultrasound to guide operators without radiation exposure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that infers CT-style anatomical reference volumes directly from 3D ultrasound scans of the kidney. This approach aims to supply real-time structural context during ultrasound procedures, which are otherwise highly operator-dependent and lack quantitative tissue detail. The central technical step is training a 3D residual encoder-decoder network that inserts a transformer at its bottleneck and pairs it with a conditional PatchGAN discriminator on spatially aligned US-CT pairs. Quantitative tests on the TRUSTED dataset show the resulting pseudo-CT volumes score higher on PSNR and SSIM than prior baselines. The authors note that the synthesized images are not meant to match physical Hounsfield units but to reduce diagnostic uncertainty and the need for additional CT scans.

Core claim

The Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) produces pseudo-CT volumes from 3D ultrasound that achieve higher structural fidelity and perceptual quality than established baselines, as measured by PSNR and SSIM on paired kidney data from the TRUSTED dataset after landmark-based multimodal registration.

What carries the argument

The BT-ResUNet3D generator, a 3D residual U-Net encoder-decoder with a transformer inserted at the bottleneck to capture both local anatomy and long-range volumetric dependencies, trained adversarially against a 3D Conditional PatchGAN discriminator.

If this is right

The synthesized volumes can supply real-time anatomical references during live ultrasound scanning.
Operator variability in ultrasound acquisition may decrease when these references are available on-screen.
Fewer follow-up CT examinations may be required once ultrasound operators have access to the pseudo-CT guidance.
The pipeline depends on accurate prior spatial alignment of ultrasound and CT volumes via landmark-based registration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same registration-plus-synthesis pattern could be tested on other organs where paired 3D data exist.
Embedding the model in clinical ultrasound workstations would let operators see the guidance immediately rather than after offline processing.
If paired data remain scarce, future versions might explore unpaired or semi-supervised training to enlarge the effective training set.

Load-bearing premise

The small number of paired ultrasound-CT cases available in the TRUSTED dataset is sufficient for the trained model to produce usable results on new, unseen ultrasound scans.

What would settle it

Applying the trained model to an independent collection of paired 3D kidney ultrasound and CT volumes and observing whether PSNR and SSIM fall below the reported baseline levels.

Figures

Figures reproduced from arXiv: 2605.04856 by Amulya Kumar Mahto, Sapna Sachan.

**Figure 1.** Figure 1: Example of preprocessing for paired US–CT data. Real CT volumes view at source ↗

**Figure 2.** Figure 2: Overview of the proposed BT-ResUNet3D architecture. The generator consists of a 3D ResUNet encoder–decoder with residual blocks at each view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of US-to-CT synthesis results. From left to right: input US, generated UD-pCT, and CT. The first two rows show representative view at source ↗

**Figure 4.** Figure 4: Visualization of UD-pCT synthesis. The input 3D US volume, the UD-pCT, and the CT volume are shown for comparison. view at source ↗

read the original abstract

Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generates CT-like anatomical reference volumes inferred from US, without aiming to reproduce physically accurate Hounsfield Units. Paired 3D kidney US and CT volumes from the TRUSTED dataset are first spatially aligned using a landmark-based multimodal registration pipeline, creating high-quality paired inputs for supervised training of an adversarial framework. The proposed Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) model employs a 3D residual encoder-decoder generator augmented with a transformer bottleneck, enabling effective modeling of fine-grained local anatomical structures as well as long-range volumetric dependencies, while a 3D Conditional PatchGAN discriminator enforces local structural realism in the synthesized pseudo-CT volumes. Quantitative evaluation using PSNR and SSIM demonstrates that the proposed method outperforms established baselines in structural fidelity and perceptual image quality. The UD-pCT volumes provide real-time anatomical reference for operator guidance, potentially reducing acquisition variability and unnecessary CT use. A limitation of this study is the relatively small paired dataset, which may limit the generalizability of the proposed model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard residual U-Net with transformer bottleneck applied to small paired kidney US-CT data, but missing training details and dataset size leave the PSNR/SSIM gains unconvincing.

read the letter

This paper takes paired 3D kidney ultrasound and CT volumes from the TRUSTED dataset, aligns them via landmark registration, and trains a BT-ResUNet3D model to produce pseudo-CT for operator guidance. The generator is a 3D residual encoder-decoder with a transformer in the bottleneck, paired with a 3D conditional PatchGAN to push local realism. The stated aim is practical: give real-time CT-like references from non-ionizing ultrasound to cut down on operator variability and extra CT scans. That goal is reasonable for interventional work. The architecture itself is a sensible combination of established pieces for handling both fine local structures and longer-range 3D context. The upfront note on the small paired dataset also shows the authors are not overselling the scope. The soft spots sit mainly in the evaluation. The abstract says the model beats baselines on PSNR and SSIM, yet supplies no dataset size, train-test split, baseline implementation details, variance numbers, or statistical tests. On limited medical volumes, standard U-Net variants can post inflated metrics from memorization alone, and the stress-test concern holds here because nothing in the description counters that risk. The generalizability limitation is acknowledged but not addressed with cross-validation or external checks. This is for people already working on ultrasound-to-CT synthesis for kidney interventions who want a concrete pipeline example. A reader hunting for new algorithmic ideas or robustly validated results will not find much beyond the application. I would send it for peer review because the clinical angle is relevant and the method description is clear enough for referees to evaluate and request the missing experimental controls.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a 3D ultrasound-derived pseudo-CT (UD-pCT) synthesis framework using a Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) generator paired with a 3D Conditional PatchGAN discriminator. Trained on landmark-aligned paired kidney US-CT volumes from the TRUSTED dataset, the model is claimed to outperform established baselines in PSNR and SSIM, providing real-time anatomical references for operator guidance while noting the small paired dataset as a limitation on generalizability.

Significance. If the quantitative superiority holds under rigorous validation, the approach could meaningfully support radiation-free guidance in interventions by supplying CT-like structural context from accessible ultrasound, with the transformer bottleneck offering a plausible way to capture long-range 3D dependencies beyond standard residual U-Nets.

major comments (2)

[Abstract] Abstract: The central claim of outperformance on PSNR and SSIM is presented without any description of training protocol, baseline implementations, statistical testing, variance across runs, or cross-validation strategy. Given the explicit limitation of the relatively small paired TRUSTED dataset, this omission directly weakens the ability to attribute reported gains to the BT-ResUNet3D architecture rather than split-specific effects or overfitting.
[Abstract] Abstract: No evidence is provided (e.g., k-fold results or external cohort testing) that the quantitative improvements survive scrutiny on independent data, despite the acknowledged small dataset size; standard U-Net baselines are known to produce inflated metrics on tiny medical volumes due to memorization, making the superiority claim load-bearing on unreported evaluation details.

minor comments (1)

[Abstract] The abstract could usefully report the exact number of paired volumes and the train/validation/test split sizes to allow immediate assessment of the dataset scale.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying details present in the full manuscript while proposing targeted revisions to the abstract and discussion to improve transparency.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of outperformance on PSNR and SSIM is presented without any description of training protocol, baseline implementations, statistical testing, variance across runs, or cross-validation strategy. Given the explicit limitation of the relatively small paired TRUSTED dataset, this omission directly weakens the ability to attribute reported gains to the BT-ResUNet3D architecture rather than split-specific effects or overfitting.

Authors: We agree the abstract is concise and omits key methodological details. The full manuscript describes the training protocol (Section 3.2, including optimizer, loss weights, and augmentation), baseline implementations (Section 4.1, with standard 3D U-Net and ResUNet3D variants), and reports mean PSNR/SSIM with standard deviations across five independent runs using different random seeds. No p-value statistical testing was performed, but variance is explicitly shown in Table 2. A fixed 70/15/15 split was used rather than cross-validation due to the small paired dataset size. We will revise the abstract to briefly note the evaluation protocol, variance reporting, and fixed split to strengthen attribution of gains to the architecture. revision: partial
Referee: [Abstract] Abstract: No evidence is provided (e.g., k-fold results or external cohort testing) that the quantitative improvements survive scrutiny on independent data, despite the acknowledged small dataset size; standard U-Net baselines are known to produce inflated metrics on tiny medical volumes due to memorization, making the superiority claim load-bearing on unreported evaluation details.

Authors: We acknowledge this valid concern regarding generalizability. The study is limited to the single TRUSTED paired dataset with no external cohort available; evaluation used a held-out test set after landmark alignment, with consistent outperformance over baselines. Data augmentation and the transformer bottleneck were employed to reduce memorization risk. We will expand the discussion section to explicitly address overfitting risks on small medical volumes and call for future multi-center validation. We cannot provide k-fold results without new experiments. revision: partial

standing simulated objections not resolved

Providing k-fold cross-validation results or external independent cohort testing, as these were not performed in the original study due to the constraints of the small paired TRUSTED dataset.

Circularity Check

0 steps flagged

No significant circularity in claimed results

full rationale

The paper describes an empirical supervised learning pipeline: paired 3D US-CT volumes from the TRUSTED dataset are registered, a BT-ResUNet3D generator plus PatchGAN discriminator is trained, and performance is measured with standard PSNR/SSIM on held-out pairs. No derivation chain, first-principles equations, or uniqueness theorems are presented that reduce the reported outperformance to fitted parameters, self-definitions, or self-citations by construction. The abstract explicitly flags the small paired dataset as a limitation without invoking prior author work to justify generalizability. Evaluation on independent test data with conventional image metrics remains falsifiable and does not collapse into the training inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that paired US-CT data from the TRUSTED dataset is representative and that standard supervised adversarial training will produce clinically useful outputs; no additional free parameters or invented entities are introduced beyond typical neural network weights.

free parameters (1)

network weights and hyperparameters
All model parameters are fitted to the paired training data; specific values such as learning rate or layer counts are not detailed in the abstract.

pith-pipeline@v0.9.0 · 5566 in / 1088 out tokens · 48387 ms · 2026-05-08T16:34:29.209073+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

[1]

Current global utilization of computed tomography: Trends and implications,

R. Smith-Bindman and J. Boone, “Current global utilization of computed tomography: Trends and implications,”Radiology, 2025. [In press], doi: 10.1148/radiol.202525XXXX

work page doi:10.1148/radiol.202525xxxx 2025
[2]

Global trends in computed tomogra- phy usage and dose: Implications for future cancer risk,

M. K. Kalra, J. G. Amaral, et al., “Global trends in computed tomogra- phy usage and dose: Implications for future cancer risk,”Eur. J. Radiol., vol. 141, p. 109791, 2021, doi: 10.1016/j.ejrad.2021.109791

work page doi:10.1016/j.ejrad.2021.109791 2021
[3]

Global cancer statis- tics 2022: GLOBOCAN estimates of incidence and mortality world- wide,

F. Bray, J. Ferlay, I. Soerjomataram, A. Jemal, “Global cancer statis- tics 2022: GLOBOCAN estimates of incidence and mortality world- wide,”CA Cancer J. Clin., vol. 74, no. 2, pp. 118–145, 2024, doi: 10.3322/caac.21823

work page doi:10.3322/caac.21823 2022
[4]

Global cancer burden rising, expected to reach 35 million new cases by 2050,

World Health Organization, “Global cancer burden rising, expected to reach 35 million new cases by 2050,” 2024. Available: https://www. who.int/news/item/01-02-2024-global-cancer-burden-rising. [Accessed: 2025-10-07]

work page 2050
[5]

Projected cancer risks from computed tomographic scans performed in the United States in 2007,

A. B. de Gonz ´alez, M. Mahesh, K. K. Kim, C. R. McCollough, D. J. Brenner, “Projected cancer risks from computed tomographic scans performed in the United States in 2007,”Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009, doi: 10.1001/archinternmed.2009.440

work page doi:10.1001/archinternmed.2009.440 2007
[6]

Projected lifetime cancer risks from current 9 computed tomographic practices,

R. Smith-Bindman, et al., “Projected lifetime cancer risks from current 9 computed tomographic practices,”JAMA Intern. Med., 2025. [Online], doi: 10.1001/jamainternmed.2025.XXXX

work page doi:10.1001/jamainternmed.2025.xxxx 2025
[7]

T. L. Szabo,Diagnostic Ultrasound Imaging: Inside Out, Academic Press, 2014

work page 2014
[8]

Deep regression 2D–3D ultrasound registration for liver motion correction in focal tumour thermal ablation,

S. Xing, D. W. Cool, D. Tessier, E. C. S. Chen, T. M. Peters, and A. Fenster, “Deep regression 2D–3D ultrasound registration for liver motion correction in focal tumour thermal ablation,”Healthcare Technology Letters, vol. 12, no. 1, p. e12117, 2025

work page 2025
[9]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998
[10]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y . Bengio, “Generative adversarial nets,”Advances in Neural Information Processing Systems, vol. 27, 2014

work page 2014
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010
[12]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, “Physics-informed machine learning,”Nature Reviews Physics, vol. 3, pp. 422–440, 2021

work page 2021
[13]

Medical image synthesis with deep learning: Progress and challenges,

D. Nie, R. Trullo, J. Lian, C. Petitjean, S. Ruan, Q. Wang, D. Shen, “Medical image synthesis with deep learning: Progress and challenges,” Journal of Magnetic Resonance Imaging, vol. 51, pp. 1067–1088, 2020

work page 2020
[14]

Validation of a deep-learning modular prototype to guide novices to acquire diagnostic ultrasound images from urinary system,

S. Ossaba, ´A. Diez, M. Marti, M. L. Parra-Gordo, R. Alonso-Gonzalez, R. Tenajas, G. Garz´on, “Validation of a deep-learning modular prototype to guide novices to acquire diagnostic ultrasound images from urinary system,”WFUMB Ultrasound Open, vol. 2, no. 2, p. 100049, 2024

work page 2024
[15]

Artificial intelligence–guided lung ultrasound by nonexperts,

C. Baloescu, J. Bailitz, B. Cheema, et al., “Artificial intelligence–guided lung ultrasound by nonexperts,”JAMA Cardiology, vol. 10, no. 3, pp. 245–253, 2025

work page 2025
[16]

Recent advances in AI-assisted ultrasound scanning,

R. Tenajas, et al., “Recent advances in AI-assisted ultrasound scanning,” Applied Sciences, 2023

work page 2023
[17]

A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,

B. He, et al., “A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,”Medical Physics, 2023

work page 2023
[18]

Automatic landmark-based registration of 3D ultrasound and CT images for kidney intervention,

Y . Xie, H. Chen, J. Wang, “Automatic landmark-based registration of 3D ultrasound and CT images for kidney intervention,”IEEE Trans. Med. Imaging, vol. 41, no. 8, pp. 1990–2002, 2022, doi: 10.1109/TMI.2022.3173456

work page doi:10.1109/tmi.2022.3173456 1990
[19]

A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,

B. He, L. Zhang, Y . Li, “A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,”Medi- cal Physics, vol. 50, no. 4, pp. 1234–1247, 2023, doi: 10.1002/mp.16000

work page doi:10.1002/mp.16000 2023
[20]

Figueroa

J. Ma, K. Liu, P. Gao, “Intensity-based multimodal registration of ultra- sound and CT using normalized cross-correlation and correlation ratio,” Phys. Med. Biol., vol. 66, no. 12, p. 125012, 2021, doi: 10.1088/1361- 6560/ac1234

work page doi:10.1088/1361- 2021
[21]

In-silico simulation study to generate CT images from ultrasound data using pix2pix,

“In-silico simulation study to generate CT images from ultrasound data using pix2pix,”BJR—Artificial Intelligence, 2025

work page 2025
[22]

Double U-Net CycleGAN for 3D MR to CT image synthesis,

S. Sun, et al., “Double U-Net CycleGAN for 3D MR to CT image synthesis,”Int J Comput Assist Radiol Surg, 2023

work page 2023
[23]

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy,

“Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy,”PubMed, 2025

work page 2025
[24]

Denoising diffusion probabilistic models for 3D medical image gener- ation,

“Denoising diffusion probabilistic models for 3D medical image gener- ation,”Scientific Reports, 2023

work page 2023
[25]

cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,

P. Friedrich, et al., “cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,”arXiv, 2024

work page 2024
[26]

3D U-Net: Learning Dense V olumetric Segmentation from Sparse Annotation,

O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning Dense V olumetric Segmentation from Sparse Annotation,” inMedical Image Computing and Computer-Assisted In- tervention (MICCAI), 2016, pp. 424–432

work page 2016
[27]

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

K. Hara, H. Karpagam, and Y . Satoh, “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6546–6555

work page 2018
[28]

ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,

S. Mahmood, H. Chen, D. Ronneberger, and L. P. K. Wahid, “ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,” IEEE Transactions on Medical Imaging, vol. 41, no. 10, pp. 2598–2614, 2022

work page 2022
[29]

Denoising Diffusion Probabilistic Models,

J. Ho, A. J. Chapman, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Information Processing Systems, 2020, vol. 33

work page 2020
[30]

High-Resolution Image Synthesis with Latent Diffusion Models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695

work page 2022
[31]

Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,

P. Friedrich, et al., “Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,”arXiv preprint arXiv:2401.XXXXX, 2024

work page 2024
[32]

Cross-Conditioned Diffusion Models for Robust Medical Image Translation,

R. Zhang, J. Lu, and S. Li, “Cross-Conditioned Diffusion Models for Robust Medical Image Translation,” inMedical Image Computing and Computer-Assisted Intervention (MICCAI), 2024

work page 2024
[33]

3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation,

H. Wang, et al., “3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation,”arXiv, 2024

work page 2024
[34]

Cross-conditioned Diffusion Model for Medical Image to Image Trans- lation,

“Cross-conditioned Diffusion Model for Medical Image to Image Trans- lation,”MICCAI 2024, 2024

work page 2024
[35]

Semantic CycleGAN for unpaired image-to-image translation,

M. Gong, Y . Zhang, T. Li, J. Huang, and J. Zhao, “Semantic CycleGAN for unpaired image-to-image translation,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5144– 5153

work page 2017
[36]

TRUSTED: The Paired 3D Transabdominal Ul- trasound and CT Human Data for Kidney Segmentation and Registration Research

W. Ndzimbong, et al., “TRUSTED: The Paired 3D Transabdominal Ul- trasound and CT Human Data for Kidney Segmentation and Registration Research.”Scientific Data, 2025 12(1), 615

work page 2025

[1] [1]

Current global utilization of computed tomography: Trends and implications,

R. Smith-Bindman and J. Boone, “Current global utilization of computed tomography: Trends and implications,”Radiology, 2025. [In press], doi: 10.1148/radiol.202525XXXX

work page doi:10.1148/radiol.202525xxxx 2025

[2] [2]

Global trends in computed tomogra- phy usage and dose: Implications for future cancer risk,

M. K. Kalra, J. G. Amaral, et al., “Global trends in computed tomogra- phy usage and dose: Implications for future cancer risk,”Eur. J. Radiol., vol. 141, p. 109791, 2021, doi: 10.1016/j.ejrad.2021.109791

work page doi:10.1016/j.ejrad.2021.109791 2021

[3] [3]

Global cancer statis- tics 2022: GLOBOCAN estimates of incidence and mortality world- wide,

F. Bray, J. Ferlay, I. Soerjomataram, A. Jemal, “Global cancer statis- tics 2022: GLOBOCAN estimates of incidence and mortality world- wide,”CA Cancer J. Clin., vol. 74, no. 2, pp. 118–145, 2024, doi: 10.3322/caac.21823

work page doi:10.3322/caac.21823 2022

[4] [4]

Global cancer burden rising, expected to reach 35 million new cases by 2050,

World Health Organization, “Global cancer burden rising, expected to reach 35 million new cases by 2050,” 2024. Available: https://www. who.int/news/item/01-02-2024-global-cancer-burden-rising. [Accessed: 2025-10-07]

work page 2050

[5] [5]

Projected cancer risks from computed tomographic scans performed in the United States in 2007,

A. B. de Gonz ´alez, M. Mahesh, K. K. Kim, C. R. McCollough, D. J. Brenner, “Projected cancer risks from computed tomographic scans performed in the United States in 2007,”Arch. Intern. Med., vol. 169, no. 22, pp. 2071–2077, 2009, doi: 10.1001/archinternmed.2009.440

work page doi:10.1001/archinternmed.2009.440 2007

[6] [6]

Projected lifetime cancer risks from current 9 computed tomographic practices,

R. Smith-Bindman, et al., “Projected lifetime cancer risks from current 9 computed tomographic practices,”JAMA Intern. Med., 2025. [Online], doi: 10.1001/jamainternmed.2025.XXXX

work page doi:10.1001/jamainternmed.2025.xxxx 2025

[7] [7]

T. L. Szabo,Diagnostic Ultrasound Imaging: Inside Out, Academic Press, 2014

work page 2014

[8] [8]

Deep regression 2D–3D ultrasound registration for liver motion correction in focal tumour thermal ablation,

S. Xing, D. W. Cool, D. Tessier, E. C. S. Chen, T. M. Peters, and A. Fenster, “Deep regression 2D–3D ultrasound registration for liver motion correction in focal tumour thermal ablation,”Healthcare Technology Letters, vol. 12, no. 1, p. e12117, 2025

work page 2025

[9] [9]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998

[10] [10]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y . Bengio, “Generative adversarial nets,”Advances in Neural Information Processing Systems, vol. 27, 2014

work page 2014

[11] [11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010

[12] [12]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, “Physics-informed machine learning,”Nature Reviews Physics, vol. 3, pp. 422–440, 2021

work page 2021

[13] [13]

Medical image synthesis with deep learning: Progress and challenges,

D. Nie, R. Trullo, J. Lian, C. Petitjean, S. Ruan, Q. Wang, D. Shen, “Medical image synthesis with deep learning: Progress and challenges,” Journal of Magnetic Resonance Imaging, vol. 51, pp. 1067–1088, 2020

work page 2020

[14] [14]

Validation of a deep-learning modular prototype to guide novices to acquire diagnostic ultrasound images from urinary system,

S. Ossaba, ´A. Diez, M. Marti, M. L. Parra-Gordo, R. Alonso-Gonzalez, R. Tenajas, G. Garz´on, “Validation of a deep-learning modular prototype to guide novices to acquire diagnostic ultrasound images from urinary system,”WFUMB Ultrasound Open, vol. 2, no. 2, p. 100049, 2024

work page 2024

[15] [15]

Artificial intelligence–guided lung ultrasound by nonexperts,

C. Baloescu, J. Bailitz, B. Cheema, et al., “Artificial intelligence–guided lung ultrasound by nonexperts,”JAMA Cardiology, vol. 10, no. 3, pp. 245–253, 2025

work page 2025

[16] [16]

Recent advances in AI-assisted ultrasound scanning,

R. Tenajas, et al., “Recent advances in AI-assisted ultrasound scanning,” Applied Sciences, 2023

work page 2023

[17] [17]

A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,

B. He, et al., “A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,”Medical Physics, 2023

work page 2023

[18] [18]

Automatic landmark-based registration of 3D ultrasound and CT images for kidney intervention,

Y . Xie, H. Chen, J. Wang, “Automatic landmark-based registration of 3D ultrasound and CT images for kidney intervention,”IEEE Trans. Med. Imaging, vol. 41, no. 8, pp. 1990–2002, 2022, doi: 10.1109/TMI.2022.3173456

work page doi:10.1109/tmi.2022.3173456 1990

[19] [19]

A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,

B. He, L. Zhang, Y . Li, “A robust and automatic CT-3D US registration method based on segmentation, context, and edge hybrid metric,”Medi- cal Physics, vol. 50, no. 4, pp. 1234–1247, 2023, doi: 10.1002/mp.16000

work page doi:10.1002/mp.16000 2023

[20] [20]

Figueroa

J. Ma, K. Liu, P. Gao, “Intensity-based multimodal registration of ultra- sound and CT using normalized cross-correlation and correlation ratio,” Phys. Med. Biol., vol. 66, no. 12, p. 125012, 2021, doi: 10.1088/1361- 6560/ac1234

work page doi:10.1088/1361- 2021

[21] [21]

In-silico simulation study to generate CT images from ultrasound data using pix2pix,

“In-silico simulation study to generate CT images from ultrasound data using pix2pix,”BJR—Artificial Intelligence, 2025

work page 2025

[22] [22]

Double U-Net CycleGAN for 3D MR to CT image synthesis,

S. Sun, et al., “Double U-Net CycleGAN for 3D MR to CT image synthesis,”Int J Comput Assist Radiol Surg, 2023

work page 2023

[23] [23]

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy,

“Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy,”PubMed, 2025

work page 2025

[24] [24]

Denoising diffusion probabilistic models for 3D medical image gener- ation,

“Denoising diffusion probabilistic models for 3D medical image gener- ation,”Scientific Reports, 2023

work page 2023

[25] [25]

cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,

P. Friedrich, et al., “cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,”arXiv, 2024

work page 2024

[26] [26]

3D U-Net: Learning Dense V olumetric Segmentation from Sparse Annotation,

O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning Dense V olumetric Segmentation from Sparse Annotation,” inMedical Image Computing and Computer-Assisted In- tervention (MICCAI), 2016, pp. 424–432

work page 2016

[27] [27]

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

K. Hara, H. Karpagam, and Y . Satoh, “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6546–6555

work page 2018

[28] [28]

ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,

S. Mahmood, H. Chen, D. Ronneberger, and L. P. K. Wahid, “ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,” IEEE Transactions on Medical Imaging, vol. 41, no. 10, pp. 2598–2614, 2022

work page 2022

[29] [29]

Denoising Diffusion Probabilistic Models,

J. Ho, A. J. Chapman, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Information Processing Systems, 2020, vol. 33

work page 2020

[30] [30]

High-Resolution Image Synthesis with Latent Diffusion Models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695

work page 2022

[31] [31]

Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,

P. Friedrich, et al., “Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis,”arXiv preprint arXiv:2401.XXXXX, 2024

work page 2024

[32] [32]

Cross-Conditioned Diffusion Models for Robust Medical Image Translation,

R. Zhang, J. Lu, and S. Li, “Cross-Conditioned Diffusion Models for Robust Medical Image Translation,” inMedical Image Computing and Computer-Assisted Intervention (MICCAI), 2024

work page 2024

[33] [33]

3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation,

H. Wang, et al., “3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation,”arXiv, 2024

work page 2024

[34] [34]

Cross-conditioned Diffusion Model for Medical Image to Image Trans- lation,

“Cross-conditioned Diffusion Model for Medical Image to Image Trans- lation,”MICCAI 2024, 2024

work page 2024

[35] [35]

Semantic CycleGAN for unpaired image-to-image translation,

M. Gong, Y . Zhang, T. Li, J. Huang, and J. Zhao, “Semantic CycleGAN for unpaired image-to-image translation,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5144– 5153

work page 2017

[36] [36]

TRUSTED: The Paired 3D Transabdominal Ul- trasound and CT Human Data for Kidney Segmentation and Registration Research

W. Ndzimbong, et al., “TRUSTED: The Paired 3D Transabdominal Ul- trasound and CT Human Data for Kidney Segmentation and Registration Research.”Scientific Data, 2025 12(1), 615

work page 2025