arxiv: 2604.20939 · v1 · submitted 2026-04-22 · ⚛️ physics.med-ph

Recognition: unknown

Deep Reinforcement Learning for Optimizing Angle Selection and Dose Allocation in CT Reconstruction

Tianyuan Wang , Dani\"el M. Pelt , Felix Lucka , Tristan van Leeuwen , K. Joost Batenburg

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:45 UTC · model grok-4.3

classification ⚛️ physics.med-ph

keywords dosereconstructionangleselectionadaptiveallocationanglesbudget

0 comments

The pith

Reinforcement learning optimizes adaptive angle selection and dose allocation in sparse-view CT reconstruction, yielding better quality and defect detectability than uniform strategies under limited projections or dose.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Computed tomography (CT) scans traditionally use many X-ray angles with equal dose to build 3D images. This paper focuses on sparse-view CT, where fewer angles and less total radiation are used to make scans quicker and safer. The challenge is choosing which angles matter most and how much dose to give each one, since some directions reveal more information than others and photon noise varies by angle. The authors combine a reconstruction technique called PWLS-PnP, which handles noisy data, with a deep reinforcement learning agent. The agent learns a policy to pick angles and doses step by step, using feedback from how well the current reconstruction is performing. Numerical tests show this adaptive approach produces clearer images and better detects defects than standard uniform-angle, equal-dose methods, especially when the total number of views or radiation budget is very small. The key idea is making the scanning process information-driven rather than fixed in advance.

Core claim

Numerical experiments show that the proposed approach improves overall reconstruction quality and enhances defect detectability compared with conventional strategies, particularly when only a small number of projections or a constrained dose budget is available.

Load-bearing premise

The reinforcement learning policy learned in simulation or on modeled data will transfer effectively to real CT systems, accurately capturing angle-dependent photon statistics and reconstruction performance without significant domain shift.

Figures

Figures reproduced from arXiv: 2604.20939 by Dani\"el M. Pelt, Felix Lucka, K. Joost Batenburg, Tianyuan Wang, Tristan van Leeuwen.

**Figure 2.** Figure 2: Example training dataset consisting of three types of phantoms. (a) [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: PSNR versus number of iterations for different angular sampling [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Reconstruction results for two phantoms with 60 projection angles [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Angle selection and dose allocation at a total dose of 6000 photons [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: (a) Angle selection and dose allocation at a total dose of 8000 photons [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of the regions used for defect detection. (Left) ROI high [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: (a) Angle selection and dose allocation at a total dose of 1000 photons [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: (a) Angle selection and dose allocation at a total dose of 2000 photons [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

read the original abstract

Traditional X-ray computed tomography (CT) scanning strategies typically select projection angles uniformly and allocate dose equally. In practice, however, CT scans often need to be fast, radiation-efficient, and adaptive. Sparse-view tomography addresses these requirements by reducing both the number of angles and the total dose budget. Under such constraints, angle selection and dose allocation should be information-driven, with more dose assigned to informative directions. To this end, we propose a dose-aware acquisition and reconstruction framework that combines a PWLS-PnP reconstruction backbone with an RL-based strategy for adaptive angle selection, explicitly accounting for angle-dependent photon statistics. Numerical experiments show that the proposed approach improves overall reconstruction quality and enhances defect detectability compared with conventional strategies, particularly when only a small number of projections or a constrained dose budget is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a dose-aware CT acquisition and reconstruction framework that pairs a penalized weighted least-squares plug-and-play (PWLS-PnP) reconstruction backbone with a deep reinforcement-learning policy for adaptive projection-angle selection and per-angle dose allocation. The RL agent is trained to maximize a reward that incorporates angle-dependent photon statistics and reconstruction fidelity. Numerical experiments on simulated sparse-view and dose-constrained phantoms report improved overall image quality and defect detectability relative to uniform-angle baselines.

Significance. If the reported numerical gains hold under broader testing, the work offers a practical route toward information-driven, radiation-efficient CT protocols. The explicit modeling of angle-dependent noise within an RL loop is a timely combination of established reconstruction tools and modern adaptive-acquisition methods, with clear relevance to low-dose and sparse-view imaging scenarios.

minor comments (3)

§3.2 (RL formulation): the state representation and reward function are described in prose but would benefit from an explicit mathematical definition or pseudocode block to allow exact reproduction of the policy training.
§4 (Numerical experiments): while the abstract asserts improvements, the main text should include tabulated quantitative metrics (e.g., PSNR, SSIM, or detectability index) with error bars or statistical significance tests against the uniform-angle and fixed-dose baselines.
Figure 4 caption: the number of independent training runs and the precise definition of the 'defect detectability' metric are not stated, making it difficult to assess the robustness of the visual comparisons.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation for minor revision. The assessment correctly captures the core contribution of combining PWLS-PnP reconstruction with a dose-aware RL policy for adaptive angle selection and allocation.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a combined framework of PWLS-PnP reconstruction with RL-based adaptive angle selection that accounts for photon statistics. Its claims rest on numerical experiments demonstrating empirical improvements over uniform baselines under sparse-view and dose constraints. No derivation chain is present that reduces a claimed result to its own inputs by construction, self-definition, or fitted-parameter renaming. The work applies existing components without invoking load-bearing self-citations or uniqueness theorems that collapse the argument. The central contribution is an empirical optimization strategy whose validity is tested externally via simulation, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; no explicit new entities or fitted constants are named.

pith-pipeline@v0.9.0 · 5447 in / 1157 out tokens · 17328 ms · 2026-05-09T22:45:00.952658+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Ct dose re- duction and dose management tools: overview of available options,

C. H. McCollough, M. R. Bruesewitz, and J. M. Kofler Jr, “Ct dose re- duction and dose management tools: overview of available options,”Ra- diographics, vol. 26, no. 2, pp. 503–512, 2006

2006
[2]

Information content of projections,

I. Kazantsev, “Information content of projections,”Inverse problems, vol. 7, no. 6, p. 887, 1991

1991
[3]

Dynamic angle selection in binary tomography,

K. J. Batenburg, W. J. Palenstijn, P. Bal´ azs, and J. Sijbers, “Dynamic angle selection in binary tomography,”Computer Vision and Image Un- derstanding, vol. 117, no. 4, pp. 306–318, 2013

2013
[4]

Adaptive compressed sensing with diffusion-based posterior sampling,

N. Elata, T. Michaeli, and M. Elad, “Adaptive compressed sensing with diffusion-based posterior sampling,” inEuropean Conference on Computer Vision. Springer, 2025, pp. 290–308

2025
[5]

Bayesian experimental design for computed tomography with the linearised deep image prior,

R. Barbano, J. Leuschner, J. Antor´ an, B. Jin, and J. M. Hern´ andez- Lobato, “Bayesian experimental design for computed tomography with the linearised deep image prior,”arXiv preprint arXiv:2207.05714, 2022

work page arXiv 2022
[6]

Sequential experimental de- sign for x-ray ct using deep reinforcement learning,

T. Wang, F. Lucka, and T. van Leeuwen, “Sequential experimental de- sign for x-ray ct using deep reinforcement learning,”IEEE Transactions on Computational Imaging, vol. 10, pp. 953–968, 2024

2024
[7]

Optimal experimental design for inverse problems with state constraints,

L. Ruthotto, J. Chung, and M. Chung, “Optimal experimental design for inverse problems with state constraints,”SIAM Journal on Scientific Com- puting, vol. 40, no. 4, pp. B1080–B1100, 2018

2018
[8]

D. V. Lindley,Bayesian statistics: A review. SIAM, 1972

1972
[9]

Fully bayesian optimal experimental design: A review,

E. Ryan, C. Drovandi, J. McGree, and A. Pettitt, “Fully bayesian optimal experimental design: A review,”International Statistical Review, vol. 84, pp. 128–154, 2016

2016
[10]

Modern bayesian experimental design,

T. Rainforth, A. Foster, D. R. Ivanova, and F. Bickford Smith, “Modern bayesian experimental design,”Statistical Science, vol. 39, no. 1, pp. 100– 114, 2024

2024
[11]

Bounds on the difference between reconstructions in binary tomography,

K. J. Batenburg, W. Fortes, L. Hajdu, and R. Tijdeman, “Bounds on the difference between reconstructions in binary tomography,” inInternational Conference on Discrete Geometry for Computer Imagery. Springer, 2011, pp. 369–380

2011
[12]

Dynamic angle selec- tion in X-ray computed tomography,

A. Dabravolski, K. J. Batenburg, and J. Sijbers, “Dynamic angle selec- tion in X-ray computed tomography,”Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, vol. 324, pp. 17–24, 2014. 29

2014
[13]

Edge-promoting adaptive bayesian experimental design for X-ray imaging,

T. Helin, N. Hyv¨ onen, and J.-P. Puska, “Edge-promoting adaptive bayesian experimental design for X-ray imaging,”SIAM Journal on Scientific Com- puting, vol. 44, no. 3, pp. B506–B530, 2022

2022
[14]

Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging,

Z. Shen, Y. Wang, D. Wu, X. Yang, and B. Dong, “Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging,”Inverse Problems & Imaging, vol. 16, no. 1, 2022

2022
[15]

Task-adaptive angle selection for computed tomography- based defect detection,

T. Wang, V. Florian, R. Schielein, C. Kretzer, S. Kasperl, F. Lucka, and T. van Leeuwen, “Task-adaptive angle selection for computed tomography- based defect detection,”Journal of Imaging, vol. 10, no. 9, p. 208, 2024

2024
[16]

Dynamic angle selection in x-ray ct: A reinforcement learning approach to optimal stopping,

T. Wang, F. Lucka, D. M. Pelt, K. J. Batenburg, and T. van Leeuwen, “Dynamic angle selection in x-ray ct: A reinforcement learning approach to optimal stopping,”arXiv preprint arXiv:2503.12688, 2025

work page arXiv 2025
[17]

Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low- dose x-ray computed tomography,

J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low- dose x-ray computed tomography,”IEEE transactions on medical imaging, vol. 25, no. 10, pp. 1272–1283, 2006

2006
[18]

A novel tomographic reconstruction method based on the robust student’s t function for suppressing data out- liers,

D. Kazantsev, F. Bleichrodt, T. van Leeuwen, A. Kaestner, P. J. With- ers, K. J. Batenburg, and P. D. Lee, “A novel tomographic reconstruction method based on the robust student’s t function for suppressing data out- liers,”IEEE Transactions on Computational Imaging, vol. 3, no. 4, pp. 682–693, 2017

2017
[19]

Ct reconstruction with pdf: Parameter-dependent framework for data from multiple geometries and dose levels,

W. Xia, Z. Lu, Y. Huang, Y. Liu, H. Chen, J. Zhou, and Y. Zhang, “Ct reconstruction with pdf: Parameter-dependent framework for data from multiple geometries and dose levels,”IEEE Transactions on Medical Imag- ing, vol. 40, no. 11, pp. 3065–3076, 2021

2021
[20]

P. C. Hansen, J. Jørgensen, and W. R. Lionheart,Computed tomography: algorithms, insight, and just enough theory. SIAM, 2021

2021
[21]

Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning,

W. Shen and X. Huan, “Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning,”Com- puter Methods in Applied Mechanics and Engineering, vol. 416, p. 116304, 2023

2023
[22]

R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction. MIT press, 2018

2018
[23]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015

work page internal anchor Pith review arXiv 2015
[24]

Penalized weighted least-squares image reconstruction for positron emis- sion tomography,

“Penalized weighted least-squares image reconstruction for positron emis- sion tomography,”IEEE transactions on medical imaging, vol. 13, no. 2, pp. 290–300, 1994. 30

1994
[25]

Plug-and-play priors for model based reconstruction,

S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in2013 IEEE global conference on signal and information processing. IEEE, 2013, pp. 945–948

2013
[26]

Nonlinear total variation based noise removal algorithms,

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,”Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992

1992
[27]

An algorithm for total variation minimization and appli- cations,

A. Chambolle, “An algorithm for total variation minimization and appli- cations,”Journal of Mathematical imaging and vision, vol. 20, no. 1, pp. 89–97, 2004

2004
[28]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Foam-like phantoms for comparing tomography algorithms,

D. M. Pelt, A. A. Hendriksen, and K. J. Batenburg, “Foam-like phantoms for comparing tomography algorithms,”Synchrotron Radiation, vol. 29, no. 1, pp. 254–265, 2022

2022
[30]

Tomosipo: fast, flexible, and convenient 3d tomography for complex scanning geometries in python,

A. A. Hendriksen, D. Schut, W. J. Palenstijn, N. Vigan´ o, J. Kim, D. M. Pelt, T. Van Leeuwen, and K. Joost Batenburg, “Tomosipo: fast, flexible, and convenient 3d tomography for complex scanning geometries in python,” Optics Express, vol. 29, no. 24, pp. 40 494–40 513, 2021

2021
[31]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

A projection access scheme for iterative reconstruction based on the golden section,

T. Kohler, “A projection access scheme for iterative reconstruction based on the golden section,” inIEEE Symposium Conference Record Nuclear Science 2004., vol. 6. IEEE, 2004, pp. 3961–3965

2004
[33]

Real-time tilt undersampling optimization during electron tomography of beam sensi- tive samples using golden ratio scanning and recast3d,

T. M. Craig, A. A. Kadu, K. J. Batenburg, and S. Bals, “Real-time tilt undersampling optimization during electron tomography of beam sensi- tive samples using golden ratio scanning and recast3d,”Nanoscale, vol. 15, no. 11, pp. 5391–5402, 2023

2023
[34]

X- ray image generation as a method of performance prediction for real-time inspection: A case study,

V. Andriiashen, R. van Liere, T. van Leeuwen, and K. J. Batenburg, “X- ray image generation as a method of performance prediction for real-time inspection: A case study,”Journal of Nondestructive Evaluation, vol. 43, no. 3, p. 79, 2024

2024
[35]

Scintillator decorrelation for self-supervised x-ray radiograph denoising,

A. Graas and F. Lucka, “Scintillator decorrelation for self-supervised x-ray radiograph denoising,”Measurement Science and Technology, vol. 36, no. 6, p. 065415, 2025. 31

2025