Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction
Pith reviewed 2026-05-08 13:13 UTC · model grok-4.3
The pith
Conditioning a 2D diffusion model on an initial 3D reconstruction and enforcing data consistency scales diffusion-based sparse-view CT reconstruction to large volumes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Conditional Diffusion Posterior Alignment (CDPA) scales diffusion-based sparse-view CT reconstruction to large 3D volumes by conditioning a 2D U-Net diffusion model on an initial 3D reconstruction to improve inter-slice consistency and applying explicit data-consistency alignment to match measured projections. The method bypasses the memory and dataset limits of full 3D diffusion models. Experiments on synthetic and real Cone Beam CT data report state-of-the-art performance, with ablations confirming the value of each element. The same conditioning and alignment steps also raise the quality of fast denoising U-Nets to near-diffusion levels at lower cost.
What carries the argument
Conditional Diffusion Posterior Alignment (CDPA): a pipeline that conditions a 2D U-Net diffusion model on an initial 3D reconstruction and adds a data-consistency alignment step to match measured projections.
Load-bearing premise
That conditioning a 2D U-Net diffusion model on an initial 3D reconstruction will reliably improve inter-slice consistency and that the added data-consistency alignment step will integrate without introducing new artifacts or degrading generative quality.
What would settle it
Ablation experiments that show no gain in inter-slice consistency metrics or quantitative scores on real CBCT data when the conditioning and alignment steps are removed would falsify the central claim.
Figures
read the original abstract
Computed Tomography (CT) is a widely used imaging modality in medical and industrial applications. To limit radiation exposure and measurement time, there is a growing interest in sparse-view CT, where the number of projection views is significantly reduced. Deep neural networks have shown great promise in improving reconstruction quality in sparse-view CT, especially generative diffusion models. However, these methods struggle to scale to large 3D volumes due to several reasons: (i) the high memory and computational requirements of 3D models, (ii) the lack of large 3D training datasets, and (iii) the inconsistencies across slices when using 2D models independently on each slice. We overcome these limitations and scale diffusion-based sparse-view CT reconstruction to large 3D volumes by combining conditional diffusion with explicit data consistency. We propose Conditional Diffusion Posterior Alignment (CDPA) to enable scalable 3D sparse-view CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to improve inter-slice consistency, combined with data-consistency alignment to match measured projections. Experiments on synthetic and real Cone Beam CT (CBCT) data show state-of-the-art performance, with ablations that confirm the synergistic effects of the proposed pipeline. Finally, we show that the same principles also strengthen fast denoising U-Nets, yielding near-diffusion quality at a fraction of the computational cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Conditional Diffusion Posterior Alignment (CDPA) to scale diffusion models for sparse-view 3D CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to promote inter-slice consistency and is paired with an explicit data-consistency alignment step that enforces agreement with measured projections. Experiments on synthetic and real CBCT data are said to demonstrate state-of-the-art performance, with ablations confirming synergistic effects of the combined pipeline; the same principles are also shown to improve fast denoising U-Nets.
Significance. If the empirical claims are substantiated, the work would be significant for medical imaging: it offers a practical route to high-quality 3D sparse-view CT without the memory and data demands of full 3D diffusion models, while the extension to accelerated U-Net inference provides a clear computational benefit.
major comments (3)
- [Abstract] Abstract: the central claim of state-of-the-art performance and synergistic ablations is asserted without any quantitative metrics, baseline comparisons, dataset sizes, or 3D-specific consistency measures (e.g., slice-to-slice variance or 3D gradient continuity). This absence prevents evaluation of whether the reported results actually support the scalability and consistency assertions.
- [Method] Method description: because the generative process remains strictly 2D (U-Net applied slice-wise), the claim that conditioning on an initial 3D volume reliably improves inter-slice coherence rests on the unverified assumption that the conditioning channel is strong enough to propagate volumetric information. No architectural details (concatenation vs. feature injection) or ablation isolating the conditioning effect are supplied, so it is unclear whether the method reduces to independent 2D diffusion plus post-hoc projection matching.
- [Experiments] Experiments: the assertion that ablations confirm synergistic effects requires explicit quantitative demonstration that removing either the conditioning or the data-consistency step produces a measurable degradation beyond what either component achieves alone; without these numbers and without 3D consistency metrics, the synergy claim cannot be assessed.
minor comments (1)
- [Abstract] The final claim that the same principles strengthen fast denoising U-Nets is interesting but is presented without any quantitative comparison to the full diffusion model or to existing accelerated baselines.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify how to strengthen the presentation of our work on scaling diffusion models for 3D sparse-view CT. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of state-of-the-art performance and synergistic ablations is asserted without any quantitative metrics, baseline comparisons, dataset sizes, or 3D-specific consistency measures (e.g., slice-to-slice variance or 3D gradient continuity). This absence prevents evaluation of whether the reported results actually support the scalability and consistency assertions.
Authors: We agree that the abstract is too concise and would benefit from including key quantitative support for the claims. The body of the manuscript reports PSNR/SSIM on synthetic and real CBCT datasets (with dataset sizes specified in Section 4), baseline comparisons against existing methods, and ablations. We will revise the abstract to incorporate specific metrics, dataset details, and a brief mention of observed improvements in 3D consistency. revision: yes
-
Referee: [Method] Method description: because the generative process remains strictly 2D (U-Net applied slice-wise), the claim that conditioning on an initial 3D volume reliably improves inter-slice coherence rests on the unverified assumption that the conditioning channel is strong enough to propagate volumetric information. No architectural details (concatenation vs. feature injection) or ablation isolating the conditioning effect are supplied, so it is unclear whether the method reduces to independent 2D diffusion plus post-hoc projection matching.
Authors: The method section specifies that the initial 3D reconstruction is concatenated as an additional input channel to the 2D U-Net, allowing the diffusion process to condition on volumetric context slice-wise. This is not equivalent to independent 2D diffusion followed only by post-hoc alignment, as the conditioning influences the generative sampling itself. We will expand the method description with explicit architectural details on the conditioning mechanism and add a dedicated ablation isolating its contribution to inter-slice coherence. revision: yes
-
Referee: [Experiments] Experiments: the assertion that ablations confirm synergistic effects requires explicit quantitative demonstration that removing either the conditioning or the data-consistency step produces a measurable degradation beyond what either component achieves alone; without these numbers and without 3D consistency metrics, the synergy claim cannot be assessed.
Authors: The experiments section already presents ablation results comparing the full CDPA pipeline against variants lacking conditioning or data-consistency alignment, with quantitative metrics showing synergistic gains. To directly address the concern, we will add explicit 3D consistency metrics (e.g., slice-to-slice variance and gradient continuity) to the ablation tables and text in the revised manuscript. revision: yes
Circularity Check
No circularity: empirical pipeline combining standard components
full rationale
The paper describes an empirical method that conditions a 2D U-Net diffusion model on an initial 3D reconstruction and adds explicit data-consistency alignment. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The derivation chain consists of architectural choices and post-processing steps whose validity is assessed via experiments and ablations rather than tautological reductions. This is the most common honest finding for applied reconstruction papers.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Conditioning a 2D diffusion model on an initial 3D reconstruction improves inter-slice consistency
- domain assumption Data-consistency alignment can be combined with generative diffusion outputs without quality loss
Reference graph
Works this paper leans on
-
[1]
Uncertainty estimation for computed tomography with a linearised deep image prior.Transactions on Machine Learning Research, 2023
Javier Antoran, Riccardo Barbano, Johannes Leuschner, Jos´ e Miguel Hern´ andez-Lobato, and Bangti Jin. Uncertainty estimation for computed tomography with a linearised deep image prior.Transactions on Machine Learning Research, 2023
2023
-
[2]
Luis Barba, Johannes Kirschner, Tomas Aidukas, Manuel Guizar-Sicairos, and Benjam´ ın B´ ejar. Diffusion active learning: Towards data-driven experimental design in computed tomography.arXiv preprint arXiv:2504.03491, 2025
-
[3]
An educated warm start for deep image prior- based micro ct reconstruction.IEEE Trans
Riccardo Barbano, Johannes Leuschner, Maximilian Schmidt, Alexander Denker, Andreas Hauptmann, Peter Maass, and Bangti Jin. An educated warm start for deep image prior- based micro ct reconstruction.IEEE Trans. Computational Imaging, 9:134–148, 2023
2023
-
[4]
McCann, Marc L
Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InProc. ICLR, 2023
2023
-
[5]
Ctspine1k: A large-scale dataset for spinal vertebrae segmentation in computed tomography,
Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, et al. Ctspine1k: A large-scale dataset for spinal vertebrae seg- mentation in computed tomography.arXiv preprint arXiv:2105.14711, 2021
-
[6]
A cone-beam x-ray computed to- mography data collection designed for machine learning.Scientific data, 6(1):215, 2019
Henri Der Sarkissian, Felix Lucka, Maureen Van Eijnatten, Giulia Colacicco, Sophia Bethany Coban, and Kees Joost Batenburg. A cone-beam x-ray computed to- mography data collection designed for machine learning.Scientific data, 6(1):215, 2019. 18
2019
-
[7]
Efficient bayesian computation by proximal markov chain monte carlo: when langevin meets moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018
Alain Durmus, Eric Moulines, and Marcelo Pereyra. Efficient bayesian computation by proximal markov chain monte carlo: when langevin meets moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018
2018
-
[8]
Yu Fang, Lanzhuju Mei, Changjian Li, Yuan Liu, Wenping Wang, Zhiming Cui, and Ding- gang Shen. Snaf: Sparse-view cbct reconstruction with neural attenuation fields.arXiv preprint arXiv:2211.17048, 2022
-
[9]
Matteo G¨ atzner and Johannes Kirschner. Principled confidence estimation for deep com- puted tomography.arXiv preprint arXiv:2602.05812, 2026
-
[10]
An fpga accelerator for 3d cone-beam sparse-view computed tomography reconstruc- tion
Yuhan Gu, Qing Wu, Zhechen Yuan, Xiangyu Zhang, Wenyan Su, Yuyao Zhang, and Xin Lou. An fpga accelerator for 3d cone-beam sparse-view computed tomography reconstruc- tion. In2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS), pages 577–581. IEEE, 2024
2024
-
[11]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020
2020
-
[12]
Image-to-image translation with conditional adversarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017
2017
-
[13]
McCann, Emmanuel Froustey, and Michael Unser
Kyong Hwan Jin, Michael T. McCann, Emmanuel Froustey, and Michael Unser. Deep con- volutional neural network for inverse problems in imaging.IEEE Trans. Image Processing, 26(9):4509–4522, 2017
2017
-
[14]
De- noising diffusion restoration models
Bahjat Kawar, Gregory Vaksman, Michael Elad, Stefano Ermon, and Jiaming Song. De- noising diffusion restoration models. InProc. NeurIPS, 2022
2022
-
[15]
Improving 3d imaging with pre-trained perpendicular 2d diffusion mod- els
Suhyeon Lee, Hyungjin Chung, Minyoung Park, Jonghyuk Park, Wi-Sun Ryu, and Jong Chul Ye. Improving 3d imaging with pre-trained perpendicular 2d diffusion mod- els. InProceedings of the IEEE/CVF international conference on computer vision, pages 10710–10720, 2023
2023
-
[16]
Learning deep intensity field for extremely sparse-view cbct reconstruction
Yiqun Lin, Zhongjin Luo, Wei Zhao, and Xiaomeng Li. Learning deep intensity field for extremely sparse-view cbct reconstruction. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 13–23. Springer, 2023
2023
-
[17]
Thiagarajan, Stewart He, K
Jiaming Liu, Rushil Anirudh, Jayaraman J. Thiagarajan, Stewart He, K. Aditya Mohan, Ulugbek S. Kamilov, and Hyojin Kim. DOLCE: A model-based probabilistic diffusion framework for limited-angle CT reconstruction. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 6343–6353, 2023
2023
-
[18]
Geometry-aware attenuation learning for sparse-view CBCT reconstruction
Zhentao Liu, Yu Fang, Changjian Li, Han Wu, Yuan Liu, Dinggang Shen, and Zhiming Cui. Geometry-aware attenuation learning for sparse-view CBCT reconstruction. 44(2):1083– 1097
-
[19]
On the posterior gap in plug & play diffusion methods for sparse-view computed tomogra- phy.IEEE Journal of Selected Topics in Signal Processing, 2026
Liam Moroy, Guillaume Bourmaud, Fr´ ed´ eric Champagnat, and Jean-Fran¸ cois Giovannelli. On the posterior gap in plug & play diffusion methods for sparse-view computed tomogra- phy.IEEE Journal of Selected Topics in Signal Processing, 2026
2026
-
[20]
Performance improvements for iterative electron tomography reconstruction using graphics processing units (gpus)
Willem Jan Palenstijn, Kees Joost Batenburg, and Jan Sijbers. Performance improvements for iterative electron tomography reconstruction using graphics processing units (gpus). Journal of structural biology, 176(2):250–253, 2011. 19
2011
-
[21]
Accelerating proximal markov chain monte carlo by using an explicit stabilized method.SIAM Journal on Imaging Sciences, 13(2):905–935, 2020
Marcelo Pereyra, Luis Vargas Mieles, and Konstantinos C Zygalakis. Accelerating proximal markov chain monte carlo by using an explicit stabilized method.SIAM Journal on Imaging Sciences, 13(2):905–935, 2020
2020
-
[22]
Equivariant bootstrapping for uncertainty quantifi- cation in imaging inverse problems
Marcelo Pereyra and Juli´ an Tachella. Equivariant bootstrapping for uncertainty quantifi- cation in imaging inverse problems. InInternational Conference on Artificial Intelligence and Statistics, pages 4141–4149. PMLR, 2024
2024
-
[23]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015
2015
-
[24]
Patient-specific reconstruction of volumetric com- puted tomography images from a single projection view via deep learning.Nature biomed- ical engineering, 3(11):880–888, 2019
Liyue Shen, Wei Zhao, and Lei Xing. Patient-specific reconstruction of volumetric com- puted tomography images from a single projection view via deep learning.Nature biomed- ical engineering, 3(11):880–888, 2019
2019
-
[25]
Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, and Tao Mei
Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. Solving inverse problems with latent diffusion models via hard data consistency.arXiv preprint arXiv:2307.08123, 2023
-
[26]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review arXiv 2010
-
[27]
Deep image prior
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. InProc. CVPR, pages 9446–9454, 2018
2018
-
[28]
Fast and flexible x-ray tomography using the astra toolbox.Optics express, 24(22):25129–25147, 2016
Wim Van Aarle, Willem Jan Palenstijn, Jeroen Cant, Eline Janssens, Folkert Bleichrodt, Andrei Dabravolski, Jan De Beenhouwer, K Joost Batenburg, and Jan Sijbers. Fast and flexible x-ray tomography using the astra toolbox.Optics express, 24(22):25129–25147, 2016
2016
-
[29]
The astra toolbox: A platform for advanced algorithm development in electron tomography.Ultramicroscopy, 157:35–47, 2015
Wim Van Aarle, Willem Jan Palenstijn, Jan De Beenhouwer, Thomas Altantzis, Sara Bals, K Joost Batenburg, and Jan Sijbers. The astra toolbox: A platform for advanced algorithm development in electron tomography.Ultramicroscopy, 157:35–47, 2015
2015
-
[30]
pixelnerf: Neural radiance fields from one or few images
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4578–4587, 2021
2021
-
[31]
Naf: neural attenuation fields for sparse- view cbct reconstruction
Ruyi Zha, Yanhao Zhang, and Hongdong Li. Naf: neural attenuation fields for sparse- view cbct reconstruction. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 442–452. Springer, 2022
2022
-
[32]
Sliding volume-based streak artifact reduction network (s-STAR net) for ultra-sparse-view com- puted tomography.BMC Medical Imaging, 25(1):364, 2025
Shiang Zhang, Yibo Hu, Ziheng Deng, Yujie Wang, Jun Zhao, and Jianqi Sun. Sliding volume-based streak artifact reduction network (s-STAR net) for ultra-sparse-view com- puted tomography.BMC Medical Imaging, 25(1):364, 2025
2025
-
[33]
Ptnet3d: A 3d high-resolution longitudinal in- fant brain mri synthesizer based on transformers.IEEE transactions on medical imaging, 41(10):2925–2940, 2022
Xuzhe Zhang, Xinzi He, Jia Guo, Nabil Ettehadi, Natalie Aw, David Semanek, Jonathan Posner, Andrew Laine, and Yun Wang. Ptnet3d: A 3d high-resolution longitudinal in- fant brain mri synthesizer based on transformers.IEEE transactions on medical imaging, 41(10):2925–2940, 2022. 20 A Supplementary Material. A.1 Network Architectures. We utilize the networks...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.