Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

Byeong Kil Lee; Jeeho Ryoo; Jiatong Han; Muhammad Ali Khaliq; Weidong Zhang; Yongchan Jung

arxiv: 2606.19365 · v1 · pith:DRDHCKGVnew · submitted 2026-06-11 · 💻 cs.LG

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

Jeeho Ryoo , Yongchan Jung , Muhammad Ali Khaliq , Weidong Zhang , Jiatong Han , Byeong Kil Lee This is my paper

Pith reviewed 2026-06-27 07:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsGPU performance analysisTensor Cores3D MRI synthesiscuDNN kernelsMed-DDPMarchitecture-aware optimizationU-Net

0 comments

The pith

Two GPU optimizations cut SM cycles and instructions by 100x for 3D diffusion training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes kernel-level behavior of the Med-DDPM 3D medical diffusion model across NVIDIA GPU generations and identifies that training is dominated by cuDNN convolution and implicit-GEMM kernels hampered by memory-access patterns, layout conversions, and low Tensor Core use. Guided by these measurements, the authors test TF32 Tensor Core activation together with a 3D channels-last layout and report large reductions in SM cycles and dynamic instructions, higher Tensor Core utilization, and a modest IPC gain on A100 hardware. These changes leave synthesis quality unchanged according to the metrics used. A reader would care because hundreds of U-Net forward passes per sample make diffusion training expensive, so targeted kernel improvements could make high-fidelity 3D MRI generation more practical.

Core claim

Training of the state-of-the-art 3D medical diffusion model Med-DDPM is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels whose inefficiencies stem from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Activating TF32 Tensor Cores and adopting a 3D channels-last layout reduces SM cycles by up to 100x, cuts dynamic instructions by 100x, raises Tensor Core utilization from 1.45x to 9.98x, and increases IPC by 7 percent on A100, all without degrading synthesis quality.

What carries the argument

TF32 Tensor Core activation combined with a 3D channels-last memory layout, which together improve kernel efficiency inside the repeated U-Net evaluations of the diffusion process.

If this is right

The same kernel inefficiencies and layout fixes are likely to appear in other U-Net-based 3D diffusion models.
Lower per-sample training cost could allow larger batch sizes or more frequent retraining on new medical datasets.
Improved Tensor Core utilization suggests the optimizations will scale to future NVIDIA architectures with stronger Tensor Core support.
The profiler-driven breakdown of warp activity and priority scores provides a reusable template for analyzing other generative workloads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the layout change also speeds up the denoising sampling phase, end-to-end inference latency for new 3D volumes would drop as well.
The memory-layout optimization might require different tuning when ported to non-NVIDIA GPUs or to multi-node distributed training.
Extending the analysis to measure power draw and memory bandwidth saturation would clarify whether the cycle reductions translate into lower energy cost.

Load-bearing premise

The chosen quality metrics and test conditions fully capture any possible degradation in synthesis quality across datasets or diffusion sampling steps.

What would settle it

A statistically significant drop in FID, SSIM, or equivalent quality scores on a held-out 3D MRI test set after the optimizations would show that quality is not preserved.

Figures

Figures reproduced from arXiv: 2606.19365 by Byeong Kil Lee, Jeeho Ryoo, Jiatong Han, Muhammad Ali Khaliq, Weidong Zhang, Yongchan Jung.

**Figure 2.** Figure 2: IPC versus Training Duration for Representative [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Med-DDPM architecture [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: IPC Stack Bars [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Med-DDPM Kernel Mix 4.2 Kernel-Level Analysis [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Med-DDPM Instruction Mix This distribution reflects the computational structure of the MedDDPM U-Net, in which 3D convolutions dominate both forward and backward passes and are mapped by PyTorch to cuDNN’s implicitGEMM backends. Architecturally, the V100’s dominant convolution kernel is mostly constrained by FP32/FP16 FMA throughput rather than by memory bandwidth, leading to a compute-bound regime with … view at source ↗

**Figure 8.** Figure 8: Kernel Mix Bar Chart for Optimizations 6.1 Overall Performance Analysis [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Instruction Mix of Baseline and Optimizations [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 12.** Figure 12: Relative Active Warps/Cycle on H100, 64.21% → 81.40% on A100, and 56.03% → 81.57% on V100 (with nearly identical values for OPT12), while DRAM bandwidth utilization collapses to 11.32%–11.87% on H100 and 15.14%–15.15% on A100 and only slightly decreases on V100 (33.07% → 28.39%). L1 behavior becomes more architecture- and layout-sensitive: on A100, the L1 hit rate increases to 29.03% (OPT2) and 27.84% (OP… view at source ↗

**Figure 11.** Figure 11: L1/L2 Hit Rate, DRAM BW Utilization kernel fusion and more aggressive Tensor Core–aware tiling essential for turning the observed cycle reductions into sustained, architecture-scaled speedups. 6.3 Memory System Analysis The cache and DRAM statistics show that OPT1 fundamentally changes how Med-DDPM uses the memory hierarchy on Ampere and Hopper, in a way that is consistent with the Tensor Core–centric e… view at source ↗

**Figure 13.** Figure 13: Relative Stall Breakdown the earlier observation that the channels-last path shifts the workload from dense, compute-bound convs into a high-occupancy yet low-efficiency regime dominated by memory-bound micro-kernels. 6.5 Scheduling Efficiency Analysis The scheduler-level stall breakdown clarifies why OPT1 shifts MedDDPM into a Tensor-Core–dominated execution regime on Ampere and Hopper and aligns with… view at source ↗

read the original abstract

Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly heterogeneous kernel behavior. This paper performs a comprehensive performance analysis of the state-of-the-art medical diffusion model, Med-DDPM, across three generations of NVIDIA architectures to study kernel-level runtime breakdowns, instruction-mix characteristics, memory system utilization, warp-level activities, and profiler priority-score estimates. We show that training is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels, with inefficiencies arising from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Guided by these insights, we evaluate two architecture-aware optimizations TF32 Tensor Core activation and a 3D channels-last layout and demonstrate that they reduce SM cycles by up to 100x, cut dynamic instructions by 100x, raise Tensor Core utilization from 1.45 to 9.98x, and increase IPC by 7% on A100, all without degrading synthesis quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard empirical profiling study on Med-DDPM that breaks down kernel behavior across GPU generations and tests two targeted optimizations, but the no-quality-loss claim rests on unshown evidence.

read the letter

The main thing to know is that the paper measures where time goes in a 3D medical diffusion model on NVIDIA hardware and reports large gains from TF32 Tensor Core use plus a channels-last layout, yet supplies no quality metrics or ablations to support the claim that synthesis quality stays the same.

The profiling work is the useful part. It shows training dominated by cuDNN convolutions and implicit GEMM, with clear calls on memory patterns, layout conversions, and low Tensor Core occupancy. The cross-generation data and the specific numbers (100x cycle and instruction cuts, Tensor Core utilization from 1.45x to 9.98x, 7% IPC lift on A100) are concrete and directly measured.

The soft spot is exactly the quality preservation statement. The abstract asserts no degradation but names no metric, shows no figures, gives no error bars, and offers no checks across step counts or datasets. Diffusion sampling is sensitive to precision and layout shifts, so the stress-test note is on target: the assumption that whatever they measured would catch any change is untested in the visible text.

This paper is for systems researchers who optimize diffusion models for medical imaging on current GPUs. Someone tuning similar U-Net workloads could use the kernel breakdowns as a reference point.

It deserves a serious referee because the measurements are hardware-grounded and the optimizations are practical, even if the abstract is thin. The full manuscript would need to add the quality evaluation details and reproducibility information before it could be assessed properly.

Recommendation: send it for review if the methods and results sections contain the missing controls and data; otherwise the central claim is too lightly supported.

Referee Report

2 major / 2 minor

Summary. The paper performs a kernel-level performance analysis of the Med-DDPM 3D diffusion model for MRI synthesis across NVIDIA GPU generations. It reports that training is dominated by cuDNN convolutions and implicit-GEMM kernels, with bottlenecks from memory patterns, layout conversions, and low Tensor Core use. It then evaluates two optimizations (TF32 Tensor Core activation and 3D channels-last layout) that reduce SM cycles and dynamic instructions by up to 100x, raise Tensor Core utilization from 1.45x to 9.98x, and increase IPC by 7% on A100, claiming these gains occur without degrading synthesis quality.

Significance. The work's direct hardware measurements and applied optimizations (no self-referential fitted parameters) are a strength. If the performance claims and quality preservation hold under detailed scrutiny, the results would be useful for efficient deployment of 3D medical diffusion models on current and future GPUs.

major comments (2)

[Abstract and optimization evaluation section] Quality preservation claim (abstract and § on optimizations): the assertion that TF32 and 3D channels-last preserve synthesis quality is load-bearing for the central contribution, yet the manuscript supplies no named metrics (FID, SSIM, 3D perceptual, or distribution distances), no evaluation across sampling step counts, no cross-dataset results, and no ablation showing that the chosen conditions bound possible degradation. Diffusion models are known to be sensitive to reduced precision and non-standard layouts; without these controls the claim cannot be evaluated.
[Results and experimental methodology sections] Performance results (abstract and § reporting SM cycles, instructions, utilization, IPC): the up-to-100x reductions and 1.45x-to-9.98x utilization gains are presented without error bars, run-to-run variance, or explicit data-exclusion rules. This makes it impossible to judge whether the reported speedups are robust or whether they depend on particular profiler settings or kernel subsets.

minor comments (2)

[Methodology] Define or cite all profiler-derived quantities (priority-score estimates, warp-level activities) with reference to the exact NVIDIA tool and version used.
[Experimental setup] Clarify the exact cuDNN and PyTorch versions, batch sizes, and diffusion timestep schedules used for both baseline and optimized runs so that the measurements can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and optimization evaluation section] Quality preservation claim (abstract and § on optimizations): the assertion that TF32 and 3D channels-last preserve synthesis quality is load-bearing for the central contribution, yet the manuscript supplies no named metrics (FID, SSIM, 3D perceptual, or distribution distances), no evaluation across sampling step counts, no cross-dataset results, and no ablation showing that the chosen conditions bound possible degradation. Diffusion models are known to be sensitive to reduced precision and non-standard layouts; without these controls the claim cannot be evaluated.

Authors: We agree the quality claim requires explicit quantitative support. The revised manuscript will add FID, SSIM, and 3D perceptual metrics for both configurations, evaluated across sampling step counts on the primary dataset, with an ablation confirming no degradation under the tested conditions. revision: yes
Referee: [Results and experimental methodology sections] Performance results (abstract and § reporting SM cycles, instructions, utilization, IPC): the up-to-100x reductions and 1.45x-to-9.98x utilization gains are presented without error bars, run-to-run variance, or explicit data-exclusion rules. This makes it impossible to judge whether the reported speedups are robust or whether they depend on particular profiler settings or kernel subsets.

Authors: The measurements used fixed Nsight Compute settings on representative kernels. The revision will add error bars from multiple runs, state the exact profiler configuration, and clarify kernel inclusion criteria to demonstrate robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements and optimizations

full rationale

The paper reports direct hardware profiler measurements (cuDNN kernels, SM cycles, IPC, Tensor Core utilization) on Med-DDPM across GPU architectures, followed by empirical testing of TF32 and channels-last layout changes. No equations, derivations, or predictions are present that reduce by construction to fitted inputs, self-definitions, or self-citation chains. All load-bearing claims rest on external benchmark data and profiler outputs rather than internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical GPU profiling observations and the domain assumption that cuDNN kernels dominate U-Net training in diffusion models; no free parameters, new entities, or ad-hoc axioms are introduced beyond standard hardware measurement practices.

axioms (1)

domain assumption cuDNN convolution and implicit-GEMM kernels dominate the runtime of U-Net evaluations in Med-DDPM
Directly stated in the abstract as the basis for identifying inefficiencies.

pith-pipeline@v0.9.1-grok · 5733 in / 1469 out tokens · 32025 ms · 2026-06-27T07:46:33.180158+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 43 canonical work pages

[1]

C. Chen, C. Giannoula, and A. Moshovos. 2024. Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models. InProceedings of the 2024 IEEE International Symposium on Workload Characterization (IISWC). IEEE, Vancouver, BC, Canada, 181–193. doi:10.1109/IISWC63097.2024.00025

work page doi:10.1109/iiswc63097.2024.00025 2024
[2]

Chen Chen, Chen Qin, Huaqi Qiu, Cheng Ouyang, Shuo Wang, and Daniel Rueckert. 2020. Realistic Adversarial Data Augmentation for MR Image Seg- mentation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI) (Lecture Notes in Computer Science, Vol. 12261). Springer, 667–677. doi:10.1007/978-3-030-59710-8_65

work page doi:10.1007/978-3-030-59710-8_65 2020
[3]

Hyungjin Chung, Eun Sun Lee, and Jong Chul Ye. 2023. MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion.IEEE Transactions on Medical Imaging42, 4 (2023), 922–934. doi:10.1109/TMI.2022.3220681

work page doi:10.1109/tmi.2022.3220681 2023
[4]

2017.NVIDIA Tesla V100 GPU Architecture

NVIDIA Corporation. 2017.NVIDIA Tesla V100 GPU Architecture. Technical Report. NVIDIA. https://images.nvidia.com/content/volta-architecture/pdf/ volta-architecture-whitepaper.pdf

2017
[5]

2020.NVIDIA A100 Tensor Core GPU Architecture

NVIDIA Corporation. 2020.NVIDIA A100 Tensor Core GPU Architecture. Technical Report. NVIDIA. https://www.nvidia.com/content/dam/en-zz/Solutions/data- center/nvidia-ampere-architecture-whitepaper.pdf

2020
[6]

2022.NVIDIA H100 Tensor Core GPU Architecture

NVIDIA Corporation. 2022.NVIDIA H100 Tensor Core GPU Architecture. Techni- cal Report. NVIDIA. https://resources.nvidia.com/en-us-hopper-architecture/ nvidia-h100-tensor-c

2022
[7]

2023.Nsight Compute Kernel Profiling Guide

NVIDIA Corporation. 2023.Nsight Compute Kernel Profiling Guide. Technical Report. NVIDIA Corporation. https://docs.nvidia.com/nsight-compute/2023.2/ pdf/ProfilingGuide.pdf v2023.2.2

2023
[8]

NVIDIA Corporation. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro43, 4 (2023), 56–65. doi:10.1109/MM.2023.10070122

work page doi:10.1109/mm.2023.10070122 2023
[9]

2025.Nsight Compute Profiling Guide

NVIDIA Corporation. 2025.Nsight Compute Profiling Guide. https://docs.nvidia. com/nsight-compute/ProfilingGuide/index.html Version 2025.3.1

2025
[10]

Bill Dally. 2023. The Secret to NVIDIA’s AI Success.IEEE Spectrum(2023). https://spectrum.ieee.org/nvidia-gpu

2023
[11]

Dombrowski, H

M. Dombrowski, H. Reynaud, J. P. Müller, M. Baugh, and B. Kainz. 2024. Trade- Offs in Fine-Tuned Diffusion Models between Accuracy and Interpretability. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. AAAI Press, 21037–21045. doi:10.1609/aaai.v38i19.30095

work page doi:10.1609/aaai.v38i19.30095 2024
[12]

Dorjsembe, H.-K

Z. Dorjsembe, H.-K. Pao, S. Odonchimed, and F. Xiao. 2024. Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis.IEEE Journal of Biomedical and Health Informatics28, 7 (July 2024), 4084–4093. doi:10.1109/JBHI.2024.3385504

work page doi:10.1109/jbhi.2024.3385504 2024
[13]

Ekelund, S

J. Ekelund, S. Markidis, and I. Peng. 2025. Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs. InProceedings of the 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). IEEE, Turin, Italy, 70–77. doi:10.1109/PDP66500. 2025.00019

work page doi:10.1109/pdp66500 2025
[14]

Gaggion, L

N. Gaggion, L. Mansilla, C. Mosquera, D. H. Milone, and E. Ferrante. 2023. Improv- ing Anatomical Plausibility in Medical Image Segmentation via Hybrid Graph Neural Networks: Applications to Chest X-Ray Analysis.IEEE Transactions on Medical Imaging42, 2 (February 2023), 546–556. doi:10.1109/TMI.2022.3224660

work page doi:10.1109/tmi.2022.3224660 2023
[15]

Irena Galić, Marija Habijan, Hrvoje Leventić, and Krešimir Romić. 2023. Machine Learning Empowering Personalized Medicine: A Comprehensive Review of Medical Image Analysis Methods.Electronics12, 21, Article 4411 (2023). doi:10. 3390/electronics12214411

2023
[16]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems

2014
[17]

P. Guo, Y. Mei, J. Zhou, S. Jiang, and V. M. Patel. 2024. ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer.IEEE Transactions on Medical Imaging43, 1 (January 2024), 582–593. doi:10.1109/TMI.2023.3314747

work page doi:10.1109/tmi.2023.3314747 2024
[18]

Bagus Hanindhito and Lizy K. John. 2024. Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly. InProceedings of the 15th ACM/SPEC International Conference on Performance Engineering (ICPE ’24). ACM,

2024
[19]

doi:10.1145/3629526.3653835

work page doi:10.1145/3629526.3653835
[20]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS)

2020
[21]

Leeman, Yue-Houng Hu, Raymond H

Shu-Hui Hsu, Zhaohui Han, Jonathan E. Leeman, Yue-Houng Hu, Raymond H. Mak, and Atchar Sudhyadhom. 2022. Synthetic CT generation for MRI-guided adaptive radiotherapy in prostate cancer.Frontiers in Oncology12 (2022). doi:10. 3389/fonc.2022.969463

arXiv 2022
[22]

Irmakci, Z

I. Irmakci, Z. E. Unel, N. Ikizler-Cinbis, and U. Bagci. 2022. Multi-Contrast MRI Segmentation Trained on Synthetic Images. InProceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, Glasgow, Scotland, United Kingdom, 5030–5034. doi:10. 1109/EMBC48229.2022.9871119

arXiv 2022
[23]

Zhe Jia, Michael Garland, and Yuandong Tian. 2016. Dissecting GPU Memory Hierarchy Through Microbenchmarking.IEEE Transactions on Parallel and Distributed Systems27, 7 (2016), 1944–1957. doi:10.1109/TPDS.2016.2531642

work page doi:10.1109/tpds.2016.2531642 2016
[24]

Chutian Jiang. 2021. Efficient Quantization Techniques for Deep Neural Net- works. InProceedings of the 2021 International Conference on Signal Process- ing and Machine Learning (CONF-SPML). IEEE, 271–277. doi:10.1109/CONF- SPML54095.2021.00059

work page doi:10.1109/conf- 2021
[25]

IEEE Journal of Biomedical and Health Informatics , author =

H. Jiang, Z. Wang, D. Liu, L. Guo, et al . 2025. Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation.IEEE Journal of Biomedical and Health Informatics29, 10 (October 2025), 7326–7335. doi:10.1109/JBHI.2025.3565183

work page doi:10.1109/jbhi.2025.3565183 2025
[26]

Mingfeng Jiang, Peihang Jia, Xin Huang, Zihan Yuan, Dongsheng Ruan, Feng Liu, and Ling Xia. 2025. Frequency-Aware Diffusion Model for Multi-Modal MRI Im- age Synthesis.Journal of Imaging11, 5 (2025), 152. doi:10.3390/jimaging11050152

work page doi:10.3390/jimaging11050152 2025
[27]

Kong et al

W. Kong et al. 2024. Cambricon-D: Full-Network Differential Acceleration for Diffusion Models. InProceedings of the 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE, Buenos Aires, Argentina, 903–914. doi:10.1109/ISCA59077.2024.00070

work page doi:10.1109/isca59077.2024.00070 2024
[28]

R. R. Kumar, S. V. Shankar, R. Jaiswal, et al. 2025. Advances in Deep Learning for Medical Image Analysis: A Comprehensive Investigation.Journal of Statistical Theory and Practice19, 1 (2025), 9. doi:10.1007/s42519-024-00422-2

work page doi:10.1007/s42519-024-00422-2 2025
[29]

Rachel Lawrence, Emma Dodsworth, Efthalia Massou, Chris Sherlaw-Johnson, Angus I. G. Ramsay, Holly Walton, Tracy O’Regan, Fergus Gleeson, Nadia Crellin, Kevin Herbert, Pei Li Ng, Holly Elphinstone, Raj Mehta, Joanne Lloyd, Amanda Halliday, Stephen Morris, and Naomi J. Fulop. 2025. Artificial intelligence for diagnostics in radiology practice: a rapid syst...

work page doi:10.1016/j.eclinm.2025.103228 2025
[30]

H. Laçi, K. Sevrani, and S. Iqbal. 2025. Deep learning approaches for classification tasks in medical X-ray, MRI, and ultrasound images: a scoping review.BMC Medical Imaging25, 1 (2025), 156. doi:10.1186/s12880-025-01701-5

work page doi:10.1186/s12880-025-01701-5 2025
[31]

Mengfang Li, Yuanyuan Jiang, Yanzhou Zhang, and Haisheng Zhu. 2023. Medical image analysis using deep learning algorithms.Frontiers in Public Health11 (2023). doi:10.3389/fpubh.2023.1273253

work page doi:10.3389/fpubh.2023.1273253 2023
[32]

D. Liu, Z. Wang, and L. Guo. 2025. A Plug-and-Play Diffusion-Styled Conversion Model for Domain Discrepancies in Medical Image Segmentation. InProceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Hyderabad, India, 1–5. doi:10.1109/ICASSP49660.2025.10889167

work page doi:10.1109/icassp49660.2025.10889167 2025
[33]

Y. Liu, Y. Feng, J. Cheng, H. Zhan, and Z. Zhu. 2025. MambaDiff: Mamba- Enhanced Diffusion Model for 3D Medical Image Segmentation.IEEE Transactions on Image Processing34 (2025), 5761–5775. doi:10.1109/TIP.2025.3607615

work page doi:10.1109/tip.2025.3607615 2025
[34]

Yifan Liu and Xipeng Shen. 2021. Analyzing and Leveraging Decoupled L1 Caches in GPUs. InProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 1–11. doi:10.1109/ISPASS48437. 2021.9407080

work page doi:10.1109/ispass48437 2021
[35]

Z. Liu, A. Song, N. Sabar, and W. Li. 2024. Evolving a Better Scheduler for Diffusion Models. InPRICAI 2023: Trends in Artificial Intelligence (Lecture Notes in Computer Science, Vol. 14326), F. Liu, A. A. Sadanandan, D. N. Pham, P. Mursanto, and D. Lukose (Eds.). Springer, Singapore. doi:10.1007/978-981-99-7022-3_37

work page doi:10.1007/978-981-99-7022-3_37 2024
[36]

Y. Luo, Q. Yang, Y. Fan, H. Qi, and M. Xia. 2024. Measurement Guidance in Diffusion Models: Insight from Medical Image Synthesis.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (December 2024), 7983–7997. doi:10.1109/TPAMI.2024.3399098 Jeeho Ryoo et al

work page doi:10.1109/tpami.2024.3399098 2024
[37]

Alessio Luschi, Linda Tognetti, Alessandra Cartocci, Elisa Cinotti, Gio- vanni Rubegni, Laura Calabrese, Martina D’onghia, Martina Dragotto, Elvira Moscarella, Gabriella Brancaccio, Giulia Briatico, Camila Scharf, Dario Buononato, Vittorio Tancredi, Carmen Cantisani, Camilla Chello, Luca Ambro- sio, Pietro Scribani Rossi, Marco Virone, Giovanni Pellacani,...

work page doi:10.1016/j.bbe.2025.09.001 2025
[38]

Maier- Hein

Gustav Müller-Franzes, David Zimmerer, Fabian Isensee, and Klaus H. Maier- Hein. 2023. A Multimodal Comparison of Latent Denoising Diffusion Probabilis- tic Models and Generative Adversarial Networks for Medical Image Synthesis. Scientific Reports13, 1 (2023), 12456. doi:10.1038/s41598-023-39278-0

work page doi:10.1038/s41598-023-39278-0 2023
[39]

Maham Nazir, Muhammad Aqeel, and Francesco Setti. 2025. Diffusion-Based Data Augmentation for Medical Image Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. IEEE, 1330–1339

2025
[40]

Nichol and Prafulla Dhariwal

Alexander Q. Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffu- sion Probabilistic Models. InProceedings of the 38th International Conference on Machine Learning

2021
[41]

IEEE Transactions on Medical Imaging. 2024. Special Issue on Score-Based Generative Models for Medical Imaging.IEEE Transactions on Medical Imaging (2024)

2024
[42]

Geon Yeong Park, Sang Wan Lee, and Jong Chul Ye. 2025. Inference-Time Diffu- sion Model Distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4049–4058

2025
[43]

Peng et al

J. Peng et al . 2022. Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.IEEE Transactions on Multimedia24 (2022), 4356–4366. doi:10.1109/TMM.2021.3116416

work page doi:10.1109/tmm.2021.3116416 2022
[44]

Matteo Pozzi, Shahryar Noei, Erich Robbi, Luca Cima, Monica Moroni, Enrico Munari, Evelin Torresani, and Giuseppe Jurman. 2024. Generating and evaluating synthetic data in digital pathology through diffusion models.Scientific Reports 14, 1 (November 2024), 28435. doi:10.1038/s41598-024-79602-w

work page doi:10.1038/s41598-024-79602-w 2024
[45]

Chen Qian, Haoyu Zhang, Dan Ruan, Yirong Zhou, and Xiaobo Qu. 2023. Physics- Informed Deep Diffusion MRI Reconstruction: Break the Bottleneck of Training Data in Artificial Intelligence. InProceedings of the IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 1–5. doi:10.1109/ISBI53787.2023.10230567

work page doi:10.1109/isbi53787.2023.10230567 2023
[46]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234–241. doi:10.1007/978-3- 319-24574-4_28

work page doi:10.1007/978-3- 2015
[47]

Samala, Karen Drukker, Amita Shukla-Dave, Heang-Ping Chan, Berk- man Sahiner, Nicholas Petrick, Hayit Greenspan, Usman Mahmood, Ronald M

Ravi K. Samala, Karen Drukker, Amita Shukla-Dave, Heang-Ping Chan, Berk- man Sahiner, Nicholas Petrick, Hayit Greenspan, Usman Mahmood, Ronald M. Summers, Georgia Tourassi, Thomas M. Deserno, Daniele Regge, Janne J. Näppi, Hiroyuki Yoshida, Zhimin Huo, Quan Chen, Daniel Vergara, Kenny H. Cha, Richard Mazurchuk, Kevin T. Grizzard, Henkjan Huisman, Lia Morr...

work page doi:10.1093/bjrai/ubae006 2024
[48]

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, and Lingjuan Lyu. 2025. Stretching Each Dollar: Diffusion Training from Scratch on a Micro- Budget. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28596–28608

2025
[49]

Isabella Barbosa Silva, Elsa Oliveira, Ricardo Melo, Luís Rosado, César Gálvez- Barrón, Irene Bernadet Heijink, Sem Hoogteijling, and Iñigo Gabilondo. 2025. Designing for Qualitative Evaluation of Synthetic Medical Data. InExtended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery,...

work page doi:10.1145/3706599.3720274 2025
[50]

Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, and Aldo Badano

Elena Sizikova, Andreu Badal, Jana G. Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, and Aldo Badano. 2024. Synthetic data in radiological imaging: current state and future outlook.BJR Artificial Intelligence1, 1 (May 2024), ubae007. doi:10.1093/bjrai/ubae007

work page doi:10.1093/bjrai/ubae007 2024
[51]

Jinzhuo Wang, Kai Wang, Yunfang Yu, Yuxing Lu, Wenchao Xiao, Zhuo Sun, Fei Liu, Zixing Zou, Yuanxu Gao, Lei Yang, Hong-Yu Zhou, Hanpei Miao, Wenting Zhao, Lisha Huang, Lingchao Zeng, Rui Guo, Ieng Chong, Boyu Deng, Linling Cheng, Xiaoniao Chen, Jing Luo, Meng-Hua Zhu, Daniel Baptista-Hon, Olivia Monteiro, Ming Li, Yu Ke, Jiahui Li, Simiao Zeng, Taihua Gua...

2025
[52]

Simoncelli, and Alan C

Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multi-Scale Structural Similarity for Image Quality Assessment. InProceedings of the 37th Asilomar Conference on Signals, Systems and Computers

2003
[53]

Ramachandran, Paul A

Asim Waqas, Aakash Tripathi, Ravi P. Ramachandran, Paul A. Stewart, and Ghulam Rasool. 2024. Multimodal data integration for oncology in the era of deep neural networks: a review.Frontiers in Artificial Intelligence7 (2024). doi:10.3389/frai.2024.1408843

work page doi:10.3389/frai.2024.1408843 2024
[54]

George Webber and Andrew J. Reader. 2024. Diffusion Models for Medical Image Reconstruction.BJR|Artificial Intelligence1, 1 (2024), ubae013. doi:10.1093/bjrai/ ubae013

work page doi:10.1093/bjrai/ 2024
[55]

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, and Jialiang Wang. 2024. Cache Me if You Can: Accelerating Diffusion Models through Block Caching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern...

2024
[56]

K. Xu, S. Lu, B. Huang, W. Wu, and Q. Liu. 2024. Stage-by-Stage Wavelet Optimization Refinement Diffusion Model for Sparse-View CT Reconstruc- tion.IEEE Transactions on Medical Imaging43, 10 (October 2024), 3412–3424. doi:10.1109/TMI.2024.3355455

work page doi:10.1109/tmi.2024.3355455 2024
[57]

Krishnan, Anne L

Tony Xu, Sepehr Hosseini, Chris Anderson, Anthony Rinaldi, Rahul G. Krishnan, Anne L. Martel, and Maged Goubran. 2025. A generalizable 3D framework and model for self-supervised learning in medical imaging.npj Digital Medicine8, 1 (2025), 639. doi:10.1038/s41746-025-02035-w

work page doi:10.1038/s41746-025-02035-w 2025
[58]

Charlene Yang, Thorsten Kurth, and Samuel Williams. 2020. Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System.Concurrency and Computation: Practice and Experience32, 24 (2020), e5547. doi:10.1002/cpe.5547

work page doi:10.1002/cpe.5547 2020
[59]

Charlene Yang, Yunsong Wang, Thorsten Kurth, Samuel Williams, and Steven Farrell. 2020. Hierarchical Roofline Performance Analysis for Deep Learning Ap- plications. InProceedings of SC ’20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE/ACM. doi:10.1109/SC41405. 2020.00045

work page doi:10.1109/sc41405 2020
[60]

Xin Yi, Ekta Walia, and Paul Babyn. 2019. Generative Adversarial Network in Medical Imaging: A Review.Medical Image Analysis(2019)

2019
[61]

Haoyu Zhang, Chen Qian, and Xiaobo Qu. 2023. A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC.IEEE Transactions on Circuits and Systems II: Express Briefs70, 10 (2023), 3456–3460. doi:10.1109/TCSII. 2023.10272667

work page doi:10.1109/tcsii 2023
[62]

Zhang, X

T. Zhang, X. Chen, C. Qu, A. Yuille, and Z. Zhou. 2024. Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?. InProceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, Athens, Greece, 1–5. doi:10.1109/ISBI56570.2024. 10635518

work page doi:10.1109/isbi56570.2024 2024
[63]

Zhao and S

J. Zhao and S. Li. 2025. Radiomics-Driven Diffusion Model and Monte Carlo Compression Sampling for Reliable Medical Image Synthesis.IEEE Journal of Biomedical and Health Informatics(2025). doi:10.1109/JBHI.2025.3602674

work page doi:10.1109/jbhi.2025.3602674 2025
[64]

Z. Zhao, F. Zhou, K. Xu, Z. Zeng, C. Guan, and S. K. Zhou. 2023. LE-UDA: Label- Efficient Unsupervised Domain Adaptation for Medical Image Segmentation. IEEE Transactions on Medical Imaging42, 3 (March 2023), 633–646. doi:10.1109/ TMI.2022.3214766

arXiv 2023
[65]

Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, and Siwei Lyu. 2024. Simple and Fast Distillation of Diffusion Models. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 40831–40860. doi:10.52202/079017-1291

work page doi:10.52202/079017-1291 2024
[66]

Lienkamp, Thomas Brox, and Olaf Ronneberger

Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 2016. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. InMedical Image Computing and Computer-Assisted Interven- tion – MICCAI

2016

[1] [1]

C. Chen, C. Giannoula, and A. Moshovos. 2024. Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models. InProceedings of the 2024 IEEE International Symposium on Workload Characterization (IISWC). IEEE, Vancouver, BC, Canada, 181–193. doi:10.1109/IISWC63097.2024.00025

work page doi:10.1109/iiswc63097.2024.00025 2024

[2] [2]

Chen Chen, Chen Qin, Huaqi Qiu, Cheng Ouyang, Shuo Wang, and Daniel Rueckert. 2020. Realistic Adversarial Data Augmentation for MR Image Seg- mentation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI) (Lecture Notes in Computer Science, Vol. 12261). Springer, 667–677. doi:10.1007/978-3-030-59710-8_65

work page doi:10.1007/978-3-030-59710-8_65 2020

[3] [3]

Hyungjin Chung, Eun Sun Lee, and Jong Chul Ye. 2023. MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion.IEEE Transactions on Medical Imaging42, 4 (2023), 922–934. doi:10.1109/TMI.2022.3220681

work page doi:10.1109/tmi.2022.3220681 2023

[4] [4]

2017.NVIDIA Tesla V100 GPU Architecture

NVIDIA Corporation. 2017.NVIDIA Tesla V100 GPU Architecture. Technical Report. NVIDIA. https://images.nvidia.com/content/volta-architecture/pdf/ volta-architecture-whitepaper.pdf

2017

[5] [5]

2020.NVIDIA A100 Tensor Core GPU Architecture

NVIDIA Corporation. 2020.NVIDIA A100 Tensor Core GPU Architecture. Technical Report. NVIDIA. https://www.nvidia.com/content/dam/en-zz/Solutions/data- center/nvidia-ampere-architecture-whitepaper.pdf

2020

[6] [6]

2022.NVIDIA H100 Tensor Core GPU Architecture

NVIDIA Corporation. 2022.NVIDIA H100 Tensor Core GPU Architecture. Techni- cal Report. NVIDIA. https://resources.nvidia.com/en-us-hopper-architecture/ nvidia-h100-tensor-c

2022

[7] [7]

2023.Nsight Compute Kernel Profiling Guide

NVIDIA Corporation. 2023.Nsight Compute Kernel Profiling Guide. Technical Report. NVIDIA Corporation. https://docs.nvidia.com/nsight-compute/2023.2/ pdf/ProfilingGuide.pdf v2023.2.2

2023

[8] [8]

NVIDIA Corporation. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro43, 4 (2023), 56–65. doi:10.1109/MM.2023.10070122

work page doi:10.1109/mm.2023.10070122 2023

[9] [9]

2025.Nsight Compute Profiling Guide

NVIDIA Corporation. 2025.Nsight Compute Profiling Guide. https://docs.nvidia. com/nsight-compute/ProfilingGuide/index.html Version 2025.3.1

2025

[10] [10]

Bill Dally. 2023. The Secret to NVIDIA’s AI Success.IEEE Spectrum(2023). https://spectrum.ieee.org/nvidia-gpu

2023

[11] [11]

Dombrowski, H

M. Dombrowski, H. Reynaud, J. P. Müller, M. Baugh, and B. Kainz. 2024. Trade- Offs in Fine-Tuned Diffusion Models between Accuracy and Interpretability. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. AAAI Press, 21037–21045. doi:10.1609/aaai.v38i19.30095

work page doi:10.1609/aaai.v38i19.30095 2024

[12] [12]

Dorjsembe, H.-K

Z. Dorjsembe, H.-K. Pao, S. Odonchimed, and F. Xiao. 2024. Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis.IEEE Journal of Biomedical and Health Informatics28, 7 (July 2024), 4084–4093. doi:10.1109/JBHI.2024.3385504

work page doi:10.1109/jbhi.2024.3385504 2024

[13] [13]

Ekelund, S

J. Ekelund, S. Markidis, and I. Peng. 2025. Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs. InProceedings of the 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). IEEE, Turin, Italy, 70–77. doi:10.1109/PDP66500. 2025.00019

work page doi:10.1109/pdp66500 2025

[14] [14]

Gaggion, L

N. Gaggion, L. Mansilla, C. Mosquera, D. H. Milone, and E. Ferrante. 2023. Improv- ing Anatomical Plausibility in Medical Image Segmentation via Hybrid Graph Neural Networks: Applications to Chest X-Ray Analysis.IEEE Transactions on Medical Imaging42, 2 (February 2023), 546–556. doi:10.1109/TMI.2022.3224660

work page doi:10.1109/tmi.2022.3224660 2023

[15] [15]

Irena Galić, Marija Habijan, Hrvoje Leventić, and Krešimir Romić. 2023. Machine Learning Empowering Personalized Medicine: A Comprehensive Review of Medical Image Analysis Methods.Electronics12, 21, Article 4411 (2023). doi:10. 3390/electronics12214411

2023

[16] [16]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems

2014

[17] [17]

P. Guo, Y. Mei, J. Zhou, S. Jiang, and V. M. Patel. 2024. ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer.IEEE Transactions on Medical Imaging43, 1 (January 2024), 582–593. doi:10.1109/TMI.2023.3314747

work page doi:10.1109/tmi.2023.3314747 2024

[18] [18]

Bagus Hanindhito and Lizy K. John. 2024. Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly. InProceedings of the 15th ACM/SPEC International Conference on Performance Engineering (ICPE ’24). ACM,

2024

[19] [19]

doi:10.1145/3629526.3653835

work page doi:10.1145/3629526.3653835

[20] [20]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS)

2020

[21] [21]

Leeman, Yue-Houng Hu, Raymond H

Shu-Hui Hsu, Zhaohui Han, Jonathan E. Leeman, Yue-Houng Hu, Raymond H. Mak, and Atchar Sudhyadhom. 2022. Synthetic CT generation for MRI-guided adaptive radiotherapy in prostate cancer.Frontiers in Oncology12 (2022). doi:10. 3389/fonc.2022.969463

arXiv 2022

[22] [22]

Irmakci, Z

I. Irmakci, Z. E. Unel, N. Ikizler-Cinbis, and U. Bagci. 2022. Multi-Contrast MRI Segmentation Trained on Synthetic Images. InProceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, Glasgow, Scotland, United Kingdom, 5030–5034. doi:10. 1109/EMBC48229.2022.9871119

arXiv 2022

[23] [23]

Zhe Jia, Michael Garland, and Yuandong Tian. 2016. Dissecting GPU Memory Hierarchy Through Microbenchmarking.IEEE Transactions on Parallel and Distributed Systems27, 7 (2016), 1944–1957. doi:10.1109/TPDS.2016.2531642

work page doi:10.1109/tpds.2016.2531642 2016

[24] [24]

Chutian Jiang. 2021. Efficient Quantization Techniques for Deep Neural Net- works. InProceedings of the 2021 International Conference on Signal Process- ing and Machine Learning (CONF-SPML). IEEE, 271–277. doi:10.1109/CONF- SPML54095.2021.00059

work page doi:10.1109/conf- 2021

[25] [25]

IEEE Journal of Biomedical and Health Informatics , author =

H. Jiang, Z. Wang, D. Liu, L. Guo, et al . 2025. Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation.IEEE Journal of Biomedical and Health Informatics29, 10 (October 2025), 7326–7335. doi:10.1109/JBHI.2025.3565183

work page doi:10.1109/jbhi.2025.3565183 2025

[26] [26]

Mingfeng Jiang, Peihang Jia, Xin Huang, Zihan Yuan, Dongsheng Ruan, Feng Liu, and Ling Xia. 2025. Frequency-Aware Diffusion Model for Multi-Modal MRI Im- age Synthesis.Journal of Imaging11, 5 (2025), 152. doi:10.3390/jimaging11050152

work page doi:10.3390/jimaging11050152 2025

[27] [27]

Kong et al

W. Kong et al. 2024. Cambricon-D: Full-Network Differential Acceleration for Diffusion Models. InProceedings of the 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE, Buenos Aires, Argentina, 903–914. doi:10.1109/ISCA59077.2024.00070

work page doi:10.1109/isca59077.2024.00070 2024

[28] [28]

R. R. Kumar, S. V. Shankar, R. Jaiswal, et al. 2025. Advances in Deep Learning for Medical Image Analysis: A Comprehensive Investigation.Journal of Statistical Theory and Practice19, 1 (2025), 9. doi:10.1007/s42519-024-00422-2

work page doi:10.1007/s42519-024-00422-2 2025

[29] [29]

Rachel Lawrence, Emma Dodsworth, Efthalia Massou, Chris Sherlaw-Johnson, Angus I. G. Ramsay, Holly Walton, Tracy O’Regan, Fergus Gleeson, Nadia Crellin, Kevin Herbert, Pei Li Ng, Holly Elphinstone, Raj Mehta, Joanne Lloyd, Amanda Halliday, Stephen Morris, and Naomi J. Fulop. 2025. Artificial intelligence for diagnostics in radiology practice: a rapid syst...

work page doi:10.1016/j.eclinm.2025.103228 2025

[30] [30]

H. Laçi, K. Sevrani, and S. Iqbal. 2025. Deep learning approaches for classification tasks in medical X-ray, MRI, and ultrasound images: a scoping review.BMC Medical Imaging25, 1 (2025), 156. doi:10.1186/s12880-025-01701-5

work page doi:10.1186/s12880-025-01701-5 2025

[31] [31]

Mengfang Li, Yuanyuan Jiang, Yanzhou Zhang, and Haisheng Zhu. 2023. Medical image analysis using deep learning algorithms.Frontiers in Public Health11 (2023). doi:10.3389/fpubh.2023.1273253

work page doi:10.3389/fpubh.2023.1273253 2023

[32] [32]

D. Liu, Z. Wang, and L. Guo. 2025. A Plug-and-Play Diffusion-Styled Conversion Model for Domain Discrepancies in Medical Image Segmentation. InProceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Hyderabad, India, 1–5. doi:10.1109/ICASSP49660.2025.10889167

work page doi:10.1109/icassp49660.2025.10889167 2025

[33] [33]

Y. Liu, Y. Feng, J. Cheng, H. Zhan, and Z. Zhu. 2025. MambaDiff: Mamba- Enhanced Diffusion Model for 3D Medical Image Segmentation.IEEE Transactions on Image Processing34 (2025), 5761–5775. doi:10.1109/TIP.2025.3607615

work page doi:10.1109/tip.2025.3607615 2025

[34] [34]

Yifan Liu and Xipeng Shen. 2021. Analyzing and Leveraging Decoupled L1 Caches in GPUs. InProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 1–11. doi:10.1109/ISPASS48437. 2021.9407080

work page doi:10.1109/ispass48437 2021

[35] [35]

Z. Liu, A. Song, N. Sabar, and W. Li. 2024. Evolving a Better Scheduler for Diffusion Models. InPRICAI 2023: Trends in Artificial Intelligence (Lecture Notes in Computer Science, Vol. 14326), F. Liu, A. A. Sadanandan, D. N. Pham, P. Mursanto, and D. Lukose (Eds.). Springer, Singapore. doi:10.1007/978-981-99-7022-3_37

work page doi:10.1007/978-981-99-7022-3_37 2024

[36] [36]

Y. Luo, Q. Yang, Y. Fan, H. Qi, and M. Xia. 2024. Measurement Guidance in Diffusion Models: Insight from Medical Image Synthesis.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (December 2024), 7983–7997. doi:10.1109/TPAMI.2024.3399098 Jeeho Ryoo et al

work page doi:10.1109/tpami.2024.3399098 2024

[37] [37]

Alessio Luschi, Linda Tognetti, Alessandra Cartocci, Elisa Cinotti, Gio- vanni Rubegni, Laura Calabrese, Martina D’onghia, Martina Dragotto, Elvira Moscarella, Gabriella Brancaccio, Giulia Briatico, Camila Scharf, Dario Buononato, Vittorio Tancredi, Carmen Cantisani, Camilla Chello, Luca Ambro- sio, Pietro Scribani Rossi, Marco Virone, Giovanni Pellacani,...

work page doi:10.1016/j.bbe.2025.09.001 2025

[38] [38]

Maier- Hein

Gustav Müller-Franzes, David Zimmerer, Fabian Isensee, and Klaus H. Maier- Hein. 2023. A Multimodal Comparison of Latent Denoising Diffusion Probabilis- tic Models and Generative Adversarial Networks for Medical Image Synthesis. Scientific Reports13, 1 (2023), 12456. doi:10.1038/s41598-023-39278-0

work page doi:10.1038/s41598-023-39278-0 2023

[39] [39]

Maham Nazir, Muhammad Aqeel, and Francesco Setti. 2025. Diffusion-Based Data Augmentation for Medical Image Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. IEEE, 1330–1339

2025

[40] [40]

Nichol and Prafulla Dhariwal

Alexander Q. Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffu- sion Probabilistic Models. InProceedings of the 38th International Conference on Machine Learning

2021

[41] [41]

IEEE Transactions on Medical Imaging. 2024. Special Issue on Score-Based Generative Models for Medical Imaging.IEEE Transactions on Medical Imaging (2024)

2024

[42] [42]

Geon Yeong Park, Sang Wan Lee, and Jong Chul Ye. 2025. Inference-Time Diffu- sion Model Distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4049–4058

2025

[43] [43]

Peng et al

J. Peng et al . 2022. Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.IEEE Transactions on Multimedia24 (2022), 4356–4366. doi:10.1109/TMM.2021.3116416

work page doi:10.1109/tmm.2021.3116416 2022

[44] [44]

Matteo Pozzi, Shahryar Noei, Erich Robbi, Luca Cima, Monica Moroni, Enrico Munari, Evelin Torresani, and Giuseppe Jurman. 2024. Generating and evaluating synthetic data in digital pathology through diffusion models.Scientific Reports 14, 1 (November 2024), 28435. doi:10.1038/s41598-024-79602-w

work page doi:10.1038/s41598-024-79602-w 2024

[45] [45]

Chen Qian, Haoyu Zhang, Dan Ruan, Yirong Zhou, and Xiaobo Qu. 2023. Physics- Informed Deep Diffusion MRI Reconstruction: Break the Bottleneck of Training Data in Artificial Intelligence. InProceedings of the IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 1–5. doi:10.1109/ISBI53787.2023.10230567

work page doi:10.1109/isbi53787.2023.10230567 2023

[46] [46]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234–241. doi:10.1007/978-3- 319-24574-4_28

work page doi:10.1007/978-3- 2015

[47] [47]

Samala, Karen Drukker, Amita Shukla-Dave, Heang-Ping Chan, Berk- man Sahiner, Nicholas Petrick, Hayit Greenspan, Usman Mahmood, Ronald M

Ravi K. Samala, Karen Drukker, Amita Shukla-Dave, Heang-Ping Chan, Berk- man Sahiner, Nicholas Petrick, Hayit Greenspan, Usman Mahmood, Ronald M. Summers, Georgia Tourassi, Thomas M. Deserno, Daniele Regge, Janne J. Näppi, Hiroyuki Yoshida, Zhimin Huo, Quan Chen, Daniel Vergara, Kenny H. Cha, Richard Mazurchuk, Kevin T. Grizzard, Henkjan Huisman, Lia Morr...

work page doi:10.1093/bjrai/ubae006 2024

[48] [48]

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, and Lingjuan Lyu. 2025. Stretching Each Dollar: Diffusion Training from Scratch on a Micro- Budget. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28596–28608

2025

[49] [49]

Isabella Barbosa Silva, Elsa Oliveira, Ricardo Melo, Luís Rosado, César Gálvez- Barrón, Irene Bernadet Heijink, Sem Hoogteijling, and Iñigo Gabilondo. 2025. Designing for Qualitative Evaluation of Synthetic Medical Data. InExtended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery,...

work page doi:10.1145/3706599.3720274 2025

[50] [50]

Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, and Aldo Badano

Elena Sizikova, Andreu Badal, Jana G. Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, and Aldo Badano. 2024. Synthetic data in radiological imaging: current state and future outlook.BJR Artificial Intelligence1, 1 (May 2024), ubae007. doi:10.1093/bjrai/ubae007

work page doi:10.1093/bjrai/ubae007 2024

[51] [51]

Jinzhuo Wang, Kai Wang, Yunfang Yu, Yuxing Lu, Wenchao Xiao, Zhuo Sun, Fei Liu, Zixing Zou, Yuanxu Gao, Lei Yang, Hong-Yu Zhou, Hanpei Miao, Wenting Zhao, Lisha Huang, Lingchao Zeng, Rui Guo, Ieng Chong, Boyu Deng, Linling Cheng, Xiaoniao Chen, Jing Luo, Meng-Hua Zhu, Daniel Baptista-Hon, Olivia Monteiro, Ming Li, Yu Ke, Jiahui Li, Simiao Zeng, Taihua Gua...

2025

[52] [52]

Simoncelli, and Alan C

Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multi-Scale Structural Similarity for Image Quality Assessment. InProceedings of the 37th Asilomar Conference on Signals, Systems and Computers

2003

[53] [53]

Ramachandran, Paul A

Asim Waqas, Aakash Tripathi, Ravi P. Ramachandran, Paul A. Stewart, and Ghulam Rasool. 2024. Multimodal data integration for oncology in the era of deep neural networks: a review.Frontiers in Artificial Intelligence7 (2024). doi:10.3389/frai.2024.1408843

work page doi:10.3389/frai.2024.1408843 2024

[54] [54]

George Webber and Andrew J. Reader. 2024. Diffusion Models for Medical Image Reconstruction.BJR|Artificial Intelligence1, 1 (2024), ubae013. doi:10.1093/bjrai/ ubae013

work page doi:10.1093/bjrai/ 2024

[55] [55]

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, and Jialiang Wang. 2024. Cache Me if You Can: Accelerating Diffusion Models through Block Caching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern...

2024

[56] [56]

K. Xu, S. Lu, B. Huang, W. Wu, and Q. Liu. 2024. Stage-by-Stage Wavelet Optimization Refinement Diffusion Model for Sparse-View CT Reconstruc- tion.IEEE Transactions on Medical Imaging43, 10 (October 2024), 3412–3424. doi:10.1109/TMI.2024.3355455

work page doi:10.1109/tmi.2024.3355455 2024

[57] [57]

Krishnan, Anne L

Tony Xu, Sepehr Hosseini, Chris Anderson, Anthony Rinaldi, Rahul G. Krishnan, Anne L. Martel, and Maged Goubran. 2025. A generalizable 3D framework and model for self-supervised learning in medical imaging.npj Digital Medicine8, 1 (2025), 639. doi:10.1038/s41746-025-02035-w

work page doi:10.1038/s41746-025-02035-w 2025

[58] [58]

Charlene Yang, Thorsten Kurth, and Samuel Williams. 2020. Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System.Concurrency and Computation: Practice and Experience32, 24 (2020), e5547. doi:10.1002/cpe.5547

work page doi:10.1002/cpe.5547 2020

[59] [59]

Charlene Yang, Yunsong Wang, Thorsten Kurth, Samuel Williams, and Steven Farrell. 2020. Hierarchical Roofline Performance Analysis for Deep Learning Ap- plications. InProceedings of SC ’20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE/ACM. doi:10.1109/SC41405. 2020.00045

work page doi:10.1109/sc41405 2020

[60] [60]

Xin Yi, Ekta Walia, and Paul Babyn. 2019. Generative Adversarial Network in Medical Imaging: A Review.Medical Image Analysis(2019)

2019

[61] [61]

Haoyu Zhang, Chen Qian, and Xiaobo Qu. 2023. A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC.IEEE Transactions on Circuits and Systems II: Express Briefs70, 10 (2023), 3456–3460. doi:10.1109/TCSII. 2023.10272667

work page doi:10.1109/tcsii 2023

[62] [62]

Zhang, X

T. Zhang, X. Chen, C. Qu, A. Yuille, and Z. Zhou. 2024. Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?. InProceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, Athens, Greece, 1–5. doi:10.1109/ISBI56570.2024. 10635518

work page doi:10.1109/isbi56570.2024 2024

[63] [63]

Zhao and S

J. Zhao and S. Li. 2025. Radiomics-Driven Diffusion Model and Monte Carlo Compression Sampling for Reliable Medical Image Synthesis.IEEE Journal of Biomedical and Health Informatics(2025). doi:10.1109/JBHI.2025.3602674

work page doi:10.1109/jbhi.2025.3602674 2025

[64] [64]

Z. Zhao, F. Zhou, K. Xu, Z. Zeng, C. Guan, and S. K. Zhou. 2023. LE-UDA: Label- Efficient Unsupervised Domain Adaptation for Medical Image Segmentation. IEEE Transactions on Medical Imaging42, 3 (March 2023), 633–646. doi:10.1109/ TMI.2022.3214766

arXiv 2023

[65] [65]

Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, and Siwei Lyu. 2024. Simple and Fast Distillation of Diffusion Models. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 40831–40860. doi:10.52202/079017-1291

work page doi:10.52202/079017-1291 2024

[66] [66]

Lienkamp, Thomas Brox, and Olaf Ronneberger

Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 2016. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. InMedical Image Computing and Computer-Assisted Interven- tion – MICCAI

2016