arxiv: 2604.21960 · v2 · submitted 2026-04-23 · 📡 eess.IV · cs.CV· cs.LG

Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction

Luis Barba , Johannes Kirschner , Benjamin Bejar This is my paper

Pith reviewed 2026-05-08 13:13 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords sparse-view CTdiffusion models3D reconstructiondata consistencyCone Beam CTconditional diffusionimage reconstruction

0 comments

The pith

Conditioning a 2D diffusion model on an initial 3D reconstruction and enforcing data consistency scales diffusion-based sparse-view CT reconstruction to large volumes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the practical barriers to using diffusion models for 3D sparse-view CT, where full 3D networks demand too much memory and data while independent 2D slice processing creates inconsistencies. It introduces Conditional Diffusion Posterior Alignment to condition a 2D U-Net on a coarse initial 3D volume and then align the output to the actual measured projections. This hybrid strategy lets the method handle large real-world Cone Beam CT scans while keeping generative quality. If the approach holds, it supports lower-dose CT imaging in medical and industrial settings without the usual quality trade-offs. Ablation results on both synthetic and real data are presented to show the two components work together.

Core claim

Conditional Diffusion Posterior Alignment (CDPA) scales diffusion-based sparse-view CT reconstruction to large 3D volumes by conditioning a 2D U-Net diffusion model on an initial 3D reconstruction to improve inter-slice consistency and applying explicit data-consistency alignment to match measured projections. The method bypasses the memory and dataset limits of full 3D diffusion models. Experiments on synthetic and real Cone Beam CT data report state-of-the-art performance, with ablations confirming the value of each element. The same conditioning and alignment steps also raise the quality of fast denoising U-Nets to near-diffusion levels at lower cost.

What carries the argument

Conditional Diffusion Posterior Alignment (CDPA): a pipeline that conditions a 2D U-Net diffusion model on an initial 3D reconstruction and adds a data-consistency alignment step to match measured projections.

Load-bearing premise

That conditioning a 2D U-Net diffusion model on an initial 3D reconstruction will reliably improve inter-slice consistency and that the added data-consistency alignment step will integrate without introducing new artifacts or degrading generative quality.

What would settle it

Ablation experiments that show no gain in inter-slice consistency metrics or quantitative scores on real CBCT data when the conditioning and alignment steps are removed would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.21960 by Benjamin Bejar, Johannes Kirschner, Luis Barba.

**Figure 1.** Figure 1: Slices (coronal, axial and sagittal from top to bottom) of the walnut reconstruction view at source ↗

**Figure 2.** Figure 2: Coronal slices of the spine reconstruction results with resolution 256 view at source ↗

**Figure 3.** Figure 3: Axial slices of the dental reconstruction results with resolution 256 view at source ↗

**Figure 4.** Figure 4: Axial slices of a walnut reconstruction using view at source ↗

**Figure 5.** Figure 5: PSNR and SSIM vs number of views for the high-resolution walnut dataset at resolu view at source ↗

**Figure 6.** Figure 6: Left: Running time comparison of different methods on the CBCT datasets with size 2563 . Conditional diffusion requires fewer data-consistency steps than its unconditional counterpart due to the information already provided by the conditioning FDK reconstruction. FDK-denoising with fine-tuning is significantly faster than diffusion-based methods while achieving comparable performance. Right: Ablation stud… view at source ↗

**Figure 7.** Figure 7: Top left: Correlation between standard deviation (STD) of diffusion samples for the 2563 walnut dataset and reconstruction error using 20 projections. The hexbin density plot shows strong positive correlation (Pearson r = 0.680, Spearman ρ = 0.735, R2 = 0.462), with linear fit Error ≈ 1.42 × STD + 0.0003. STD explains 46.2% of error variance, demonstrating its utility as an uncertainty indicator. Top right… view at source ↗

read the original abstract

Computed Tomography (CT) is a widely used imaging modality in medical and industrial applications. To limit radiation exposure and measurement time, there is a growing interest in sparse-view CT, where the number of projection views is significantly reduced. Deep neural networks have shown great promise in improving reconstruction quality in sparse-view CT, especially generative diffusion models. However, these methods struggle to scale to large 3D volumes due to several reasons: (i) the high memory and computational requirements of 3D models, (ii) the lack of large 3D training datasets, and (iii) the inconsistencies across slices when using 2D models independently on each slice. We overcome these limitations and scale diffusion-based sparse-view CT reconstruction to large 3D volumes by combining conditional diffusion with explicit data consistency. We propose Conditional Diffusion Posterior Alignment (CDPA) to enable scalable 3D sparse-view CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to improve inter-slice consistency, combined with data-consistency alignment to match measured projections. Experiments on synthetic and real Cone Beam CT (CBCT) data show state-of-the-art performance, with ablations that confirm the synergistic effects of the proposed pipeline. Finally, we show that the same principles also strengthen fast denoising U-Nets, yielding near-diffusion quality at a fraction of the computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CDPA scales diffusion CT to 3D volumes via 2D conditioning on a 3D init plus data alignment, but the consistency benefit still needs direct 3D metrics to confirm it beats simpler baselines.

read the letter

The paper's core move is to run a 2D diffusion U-Net per slice while feeding it an initial 3D reconstruction as conditioning, then add an explicit posterior alignment step to match the measured projections. This sidesteps the memory wall of full 3D diffusion and the slice-wise artifacts of independent 2D runs. The same conditioning trick is also shown to lift plain denoising U-Nets close to diffusion quality at lower cost. Those are the practical pieces worth noting for anyone trying to deploy generative models on real CBCT scanners.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes Conditional Diffusion Posterior Alignment (CDPA) to scale diffusion models for sparse-view 3D CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to promote inter-slice consistency and is paired with an explicit data-consistency alignment step that enforces agreement with measured projections. Experiments on synthetic and real CBCT data are said to demonstrate state-of-the-art performance, with ablations confirming synergistic effects of the combined pipeline; the same principles are also shown to improve fast denoising U-Nets.

Significance. If the empirical claims are substantiated, the work would be significant for medical imaging: it offers a practical route to high-quality 3D sparse-view CT without the memory and data demands of full 3D diffusion models, while the extension to accelerated U-Net inference provides a clear computational benefit.

major comments (3)

[Abstract] Abstract: the central claim of state-of-the-art performance and synergistic ablations is asserted without any quantitative metrics, baseline comparisons, dataset sizes, or 3D-specific consistency measures (e.g., slice-to-slice variance or 3D gradient continuity). This absence prevents evaluation of whether the reported results actually support the scalability and consistency assertions.
[Method] Method description: because the generative process remains strictly 2D (U-Net applied slice-wise), the claim that conditioning on an initial 3D volume reliably improves inter-slice coherence rests on the unverified assumption that the conditioning channel is strong enough to propagate volumetric information. No architectural details (concatenation vs. feature injection) or ablation isolating the conditioning effect are supplied, so it is unclear whether the method reduces to independent 2D diffusion plus post-hoc projection matching.
[Experiments] Experiments: the assertion that ablations confirm synergistic effects requires explicit quantitative demonstration that removing either the conditioning or the data-consistency step produces a measurable degradation beyond what either component achieves alone; without these numbers and without 3D consistency metrics, the synergy claim cannot be assessed.

minor comments (1)

[Abstract] The final claim that the same principles strengthen fast denoising U-Nets is interesting but is presented without any quantitative comparison to the full diffusion model or to existing accelerated baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify how to strengthen the presentation of our work on scaling diffusion models for 3D sparse-view CT. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of state-of-the-art performance and synergistic ablations is asserted without any quantitative metrics, baseline comparisons, dataset sizes, or 3D-specific consistency measures (e.g., slice-to-slice variance or 3D gradient continuity). This absence prevents evaluation of whether the reported results actually support the scalability and consistency assertions.

Authors: We agree that the abstract is too concise and would benefit from including key quantitative support for the claims. The body of the manuscript reports PSNR/SSIM on synthetic and real CBCT datasets (with dataset sizes specified in Section 4), baseline comparisons against existing methods, and ablations. We will revise the abstract to incorporate specific metrics, dataset details, and a brief mention of observed improvements in 3D consistency. revision: yes
Referee: [Method] Method description: because the generative process remains strictly 2D (U-Net applied slice-wise), the claim that conditioning on an initial 3D volume reliably improves inter-slice coherence rests on the unverified assumption that the conditioning channel is strong enough to propagate volumetric information. No architectural details (concatenation vs. feature injection) or ablation isolating the conditioning effect are supplied, so it is unclear whether the method reduces to independent 2D diffusion plus post-hoc projection matching.

Authors: The method section specifies that the initial 3D reconstruction is concatenated as an additional input channel to the 2D U-Net, allowing the diffusion process to condition on volumetric context slice-wise. This is not equivalent to independent 2D diffusion followed only by post-hoc alignment, as the conditioning influences the generative sampling itself. We will expand the method description with explicit architectural details on the conditioning mechanism and add a dedicated ablation isolating its contribution to inter-slice coherence. revision: yes
Referee: [Experiments] Experiments: the assertion that ablations confirm synergistic effects requires explicit quantitative demonstration that removing either the conditioning or the data-consistency step produces a measurable degradation beyond what either component achieves alone; without these numbers and without 3D consistency metrics, the synergy claim cannot be assessed.

Authors: The experiments section already presents ablation results comparing the full CDPA pipeline against variants lacking conditioning or data-consistency alignment, with quantitative metrics showing synergistic gains. To directly address the concern, we will add explicit 3D consistency metrics (e.g., slice-to-slice variance and gradient continuity) to the ablation tables and text in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline combining standard components

full rationale

The paper describes an empirical method that conditions a 2D U-Net diffusion model on an initial 3D reconstruction and adds explicit data-consistency alignment. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The derivation chain consists of architectural choices and post-processing steps whose validity is assessed via experiments and ablations rather than tautological reductions. This is the most common honest finding for applied reconstruction papers.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on trained neural-network weights and standard assumptions from diffusion modeling and CT physics; no explicit free parameters, new physical entities, or ad-hoc axioms are introduced beyond the method design itself.

axioms (2)

domain assumption Conditioning a 2D diffusion model on an initial 3D reconstruction improves inter-slice consistency
Core premise stated in the abstract for overcoming slice inconsistency.
domain assumption Data-consistency alignment can be combined with generative diffusion outputs without quality loss
Assumed synergistic effect confirmed by ablations per the abstract.

pith-pipeline@v0.9.0 · 5549 in / 1404 out tokens · 59511 ms · 2026-05-08T13:13:14.365080+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Uncertainty estimation for computed tomography with a linearised deep image prior.Transactions on Machine Learning Research, 2023

Javier Antoran, Riccardo Barbano, Johannes Leuschner, Jos´ e Miguel Hern´ andez-Lobato, and Bangti Jin. Uncertainty estimation for computed tomography with a linearised deep image prior.Transactions on Machine Learning Research, 2023

2023
[2]

Diffusion active learn- ing: Towards data-driven experimental design in computed tomography.arXiv preprint arXiv:2504.03491, 2025

Luis Barba, Johannes Kirschner, Tomas Aidukas, Manuel Guizar-Sicairos, and Benjam´ ın B´ ejar. Diffusion active learning: Towards data-driven experimental design in computed tomography.arXiv preprint arXiv:2504.03491, 2025

work page arXiv 2025
[3]

An educated warm start for deep image prior- based micro ct reconstruction.IEEE Trans

Riccardo Barbano, Johannes Leuschner, Maximilian Schmidt, Alexander Denker, Andreas Hauptmann, Peter Maass, and Bangti Jin. An educated warm start for deep image prior- based micro ct reconstruction.IEEE Trans. Computational Imaging, 9:134–148, 2023

2023
[4]

McCann, Marc L

Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InProc. ICLR, 2023

2023
[5]

Ctspine1k: A large-scale dataset for spinal vertebrae segmentation in computed tomography,

Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, et al. Ctspine1k: A large-scale dataset for spinal vertebrae seg- mentation in computed tomography.arXiv preprint arXiv:2105.14711, 2021

work page arXiv 2021
[6]

A cone-beam x-ray computed to- mography data collection designed for machine learning.Scientific data, 6(1):215, 2019

Henri Der Sarkissian, Felix Lucka, Maureen Van Eijnatten, Giulia Colacicco, Sophia Bethany Coban, and Kees Joost Batenburg. A cone-beam x-ray computed to- mography data collection designed for machine learning.Scientific data, 6(1):215, 2019. 18

2019
[7]

Efficient bayesian computation by proximal markov chain monte carlo: when langevin meets moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018

Alain Durmus, Eric Moulines, and Marcelo Pereyra. Efficient bayesian computation by proximal markov chain monte carlo: when langevin meets moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018

2018
[8]

Snaf: Sparse-view cbct reconstruction with neural attenuation fields.arXiv preprint arXiv:2211.17048, 2022

Yu Fang, Lanzhuju Mei, Changjian Li, Yuan Liu, Wenping Wang, Zhiming Cui, and Ding- gang Shen. Snaf: Sparse-view cbct reconstruction with neural attenuation fields.arXiv preprint arXiv:2211.17048, 2022

work page arXiv 2022
[9]

Principled confidence estimation for deep com- puted tomography.arXiv preprint arXiv:2602.05812, 2026

Matteo G¨ atzner and Johannes Kirschner. Principled confidence estimation for deep com- puted tomography.arXiv preprint arXiv:2602.05812, 2026

work page arXiv 2026
[10]

An fpga accelerator for 3d cone-beam sparse-view computed tomography reconstruc- tion

Yuhan Gu, Qing Wu, Zhechen Yuan, Xiangyu Zhang, Wenyan Su, Yuyao Zhang, and Xin Lou. An fpga accelerator for 3d cone-beam sparse-view computed tomography reconstruc- tion. In2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS), pages 577–581. IEEE, 2024

2024
[11]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020

2020
[12]

Image-to-image translation with conditional adversarial networks

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017

2017
[13]

McCann, Emmanuel Froustey, and Michael Unser

Kyong Hwan Jin, Michael T. McCann, Emmanuel Froustey, and Michael Unser. Deep con- volutional neural network for inverse problems in imaging.IEEE Trans. Image Processing, 26(9):4509–4522, 2017

2017
[14]

De- noising diffusion restoration models

Bahjat Kawar, Gregory Vaksman, Michael Elad, Stefano Ermon, and Jiaming Song. De- noising diffusion restoration models. InProc. NeurIPS, 2022

2022
[15]

Improving 3d imaging with pre-trained perpendicular 2d diffusion mod- els

Suhyeon Lee, Hyungjin Chung, Minyoung Park, Jonghyuk Park, Wi-Sun Ryu, and Jong Chul Ye. Improving 3d imaging with pre-trained perpendicular 2d diffusion mod- els. InProceedings of the IEEE/CVF international conference on computer vision, pages 10710–10720, 2023

2023
[16]

Learning deep intensity field for extremely sparse-view cbct reconstruction

Yiqun Lin, Zhongjin Luo, Wei Zhao, and Xiaomeng Li. Learning deep intensity field for extremely sparse-view cbct reconstruction. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 13–23. Springer, 2023

2023
[17]

Thiagarajan, Stewart He, K

Jiaming Liu, Rushil Anirudh, Jayaraman J. Thiagarajan, Stewart He, K. Aditya Mohan, Ulugbek S. Kamilov, and Hyojin Kim. DOLCE: A model-based probabilistic diffusion framework for limited-angle CT reconstruction. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 6343–6353, 2023

2023
[18]

Geometry-aware attenuation learning for sparse-view CBCT reconstruction

Zhentao Liu, Yu Fang, Changjian Li, Han Wu, Yuan Liu, Dinggang Shen, and Zhiming Cui. Geometry-aware attenuation learning for sparse-view CBCT reconstruction. 44(2):1083– 1097
[19]

On the posterior gap in plug & play diffusion methods for sparse-view computed tomogra- phy.IEEE Journal of Selected Topics in Signal Processing, 2026

Liam Moroy, Guillaume Bourmaud, Fr´ ed´ eric Champagnat, and Jean-Fran¸ cois Giovannelli. On the posterior gap in plug & play diffusion methods for sparse-view computed tomogra- phy.IEEE Journal of Selected Topics in Signal Processing, 2026

2026
[20]

Performance improvements for iterative electron tomography reconstruction using graphics processing units (gpus)

Willem Jan Palenstijn, Kees Joost Batenburg, and Jan Sijbers. Performance improvements for iterative electron tomography reconstruction using graphics processing units (gpus). Journal of structural biology, 176(2):250–253, 2011. 19

2011
[21]

Accelerating proximal markov chain monte carlo by using an explicit stabilized method.SIAM Journal on Imaging Sciences, 13(2):905–935, 2020

Marcelo Pereyra, Luis Vargas Mieles, and Konstantinos C Zygalakis. Accelerating proximal markov chain monte carlo by using an explicit stabilized method.SIAM Journal on Imaging Sciences, 13(2):905–935, 2020

2020
[22]

Equivariant bootstrapping for uncertainty quantifi- cation in imaging inverse problems

Marcelo Pereyra and Juli´ an Tachella. Equivariant bootstrapping for uncertainty quantifi- cation in imaging inverse problems. InInternational Conference on Artificial Intelligence and Statistics, pages 4141–4149. PMLR, 2024

2024
[23]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015

2015
[24]

Patient-specific reconstruction of volumetric com- puted tomography images from a single projection view via deep learning.Nature biomed- ical engineering, 3(11):880–888, 2019

Liyue Shen, Wei Zhao, and Lei Xing. Patient-specific reconstruction of volumetric com- puted tomography images from a single projection view via deep learning.Nature biomed- ical engineering, 3(11):880–888, 2019

2019
[25]

Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, and Tao Mei

Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. Solving inverse problems with latent diffusion models via hard data consistency.arXiv preprint arXiv:2307.08123, 2023

work page arXiv 2023
[26]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review arXiv 2010
[27]

Deep image prior

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. InProc. CVPR, pages 9446–9454, 2018

2018
[28]

Fast and flexible x-ray tomography using the astra toolbox.Optics express, 24(22):25129–25147, 2016

Wim Van Aarle, Willem Jan Palenstijn, Jeroen Cant, Eline Janssens, Folkert Bleichrodt, Andrei Dabravolski, Jan De Beenhouwer, K Joost Batenburg, and Jan Sijbers. Fast and flexible x-ray tomography using the astra toolbox.Optics express, 24(22):25129–25147, 2016

2016
[29]

The astra toolbox: A platform for advanced algorithm development in electron tomography.Ultramicroscopy, 157:35–47, 2015

Wim Van Aarle, Willem Jan Palenstijn, Jan De Beenhouwer, Thomas Altantzis, Sara Bals, K Joost Batenburg, and Jan Sijbers. The astra toolbox: A platform for advanced algorithm development in electron tomography.Ultramicroscopy, 157:35–47, 2015

2015
[30]

pixelnerf: Neural radiance fields from one or few images

Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4578–4587, 2021

2021
[31]

Naf: neural attenuation fields for sparse- view cbct reconstruction

Ruyi Zha, Yanhao Zhang, and Hongdong Li. Naf: neural attenuation fields for sparse- view cbct reconstruction. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 442–452. Springer, 2022

2022
[32]

Sliding volume-based streak artifact reduction network (s-STAR net) for ultra-sparse-view com- puted tomography.BMC Medical Imaging, 25(1):364, 2025

Shiang Zhang, Yibo Hu, Ziheng Deng, Yujie Wang, Jun Zhao, and Jianqi Sun. Sliding volume-based streak artifact reduction network (s-STAR net) for ultra-sparse-view com- puted tomography.BMC Medical Imaging, 25(1):364, 2025

2025
[33]

Ptnet3d: A 3d high-resolution longitudinal in- fant brain mri synthesizer based on transformers.IEEE transactions on medical imaging, 41(10):2925–2940, 2022

Xuzhe Zhang, Xinzi He, Jia Guo, Nabil Ettehadi, Natalie Aw, David Semanek, Jonathan Posner, Andrew Laine, and Yun Wang. Ptnet3d: A 3d high-resolution longitudinal in- fant brain mri synthesizer based on transformers.IEEE transactions on medical imaging, 41(10):2925–2940, 2022. 20 A Supplementary Material. A.1 Network Architectures. We utilize the networks...

2022