Memory-efficient optimization of implicit neural representations for CT reconstruction

Gregory Ongie; Mahrokh Najaf

arxiv: 2604.09884 · v1 · submitted 2026-04-10 · 📡 eess.IV

Memory-efficient optimization of implicit neural representations for CT reconstruction

Mahrokh Najaf , Gregory Ongie This is my paper

Pith reviewed 2026-05-10 16:00 UTC · model grok-4.3

classification 📡 eess.IV

keywords implicit neural representationsCT reconstructionmemory-efficient trainingJacobian-vector productstochastic gradientcone beam CTsparse-view reconstruction

0 comments

The pith

A stochastic approximation to the gradient lets implicit neural representations for CT reconstruction train with far less GPU memory while preserving reconstruction quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard back-propagation through ray projections in INR-based CT reconstruction consumes too much memory for practical 3D use. It decomposes the gradient into a Jacobian-vector product that can be estimated by randomly subsampling coordinates along each ray. Experiments on synthetic 2D data confirm that the memory reduction is substantial yet convergence speed and mean-squared error remain comparable to full-gradient training. The same method then enables sparse-view 3D cone-beam CT reconstruction on hardware that would otherwise run out of memory.

Core claim

The central discovery is that the gradient of the data-fidelity loss with respect to INR parameters can be rewritten as a Jacobian-vector product that admits unbiased stochastic subsampling; using this estimator in place of full auto-differentiation yields reconstructions whose convergence behavior and final MSE match those of standard INR training while using dramatically lower peak GPU memory.

What carries the argument

Stochastic subsampling of the Jacobian-vector product that computes the gradient of the ray-projection loss.

If this is right

3D cone-beam CT volumes can be reconstructed from sparse views on a single consumer GPU.
Higher-resolution or deeper INRs become feasible without changing hardware.
The memory-accuracy trade-off can be tuned by changing the fraction of coordinates kept per ray.
The same decomposition applies to any differentiable forward model that sums over many spatial samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique could be combined with coordinate-based networks that already use positional encodings to further reduce variance.
It may extend naturally to other linear inverse problems such as MRI or limited-angle tomography.
Real measured CT data could be used to check whether the stochastic estimator remains stable when noise and beam-hardening are present.

Load-bearing premise

The bias and variance introduced by randomly dropping coordinates inside each ray projection remain small enough that the optimizer still reaches a solution of comparable quality to the exact gradient.

What would settle it

On the same 2D synthetic phantoms, the memory-efficient method produces final reconstructions whose mean-squared error is more than 20 percent higher than full-gradient training after the same number of epochs.

Figures

Figures reproduced from arXiv: 2604.09884 by Gregory Ongie, Mahrokh Najaf.

**Figure 1.** Figure 1: 2D FORBILD sparse-view reconstructions obtained with conjugate gradient least squares (CGLS) and filtered backprojection (FBP), alongside reconstructions produced by training an INR using FFNs, SIREN, and Hash Encoding architectures. All images are shown on the scale [0.0200,0.0220] mm−1 . 0 1000 2000 3000 4000 10 6 10 5 10 4 10 3 10 2 iterations iterations iterations image MSE FFNs SIREN Hash Encoding FFN… view at source ↗

**Figure 2.** Figure 2: Top row (2D FORBILD): Image MSE plots for INR training with FFNs, SIREN, and hash encoding, comparing auto-differentiation and gradient approximation (for both LS and FLS). Bottom row (3D Walnut): Image MSE plots for INR training with FFNs, SIREN, and hash encoding using gradient approximation (for both LS and FLS). experiments were conducted using a single NVIDIA GeForce RTX 4090 GPU with 24 GB RAM. As ba… view at source ↗

**Figure 3.** Figure 3: shows reconstructions obtained after training an INR for 1000 iterations using the Adam optimizer. All three INR architectures (FFNs, SIREN, and Hash Encoding) produce reconstructions that are comparable to those from CGLS and FBP. The bottom panel of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Implicit neural representations (INRs) provide a parameter-efficient and fully differentiable image model for CT reconstruction. However, optimizing INRs for CT reconstruction using standard auto-differentiation techniques can be prohibitively GPU memory-intensive, especially in 3D imaging, due to the large number of INR evaluations needed to simulate ray projections. To address this issue, we propose a memory-efficient stochastic gradient approximation based on decomposing the gradient into a Jacobian-vector product that is amenable to stochastic subsampling. This approximation allows the user to trade-off between GPU memory usage and gradient approximation accuracy. Our experiments on synthetic 2D data demonstrate that gradient approximation uses far less GPU memory than standard INR training, while yielding reconstructions that are comparable in convergence behavior and mean squared error. Finally, we demonstrate that the proposed approach allows for memory-efficient 3D cone beam CT reconstruction in a sparse-view setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable stochastic JVP-subsampling trick that drops memory use for INR optimization in CT while keeping 2D reconstruction quality close to full-gradient training.

read the letter

The core idea is to decompose the gradient of the INR-based forward model into a Jacobian-vector product and then subsample that product stochastically. This lets the optimizer trade memory against gradient accuracy without changing the underlying INR or the CT measurement model. On synthetic 2D data the authors report that the memory savings are substantial and that convergence speed and final MSE stay comparable to standard autodiff. They also show the method runs on 3D sparse-view cone-beam data where full-gradient training would not fit in GPU memory. That practical demonstration is the main new piece; it is not a theoretical advance in INR theory but a targeted engineering fix for a known bottleneck in volumetric imaging. The approach is straightforward once stated and rests on existing autodiff primitives, so the implementation burden looks low. The 2D results appear to support the claim that the approximation does not derail the optimizer, at least for the tested regimes. The soft spot is the missing detail on how the subsampling fraction affects bias and variance. The abstract gives no numbers, no ablation on sampling rate versus error, and no analysis of whether the approximation error grows with ray count or network depth. If the stochastic gradient introduces enough noise or bias in 3D, the claimed comparability could shrink. The paper is aimed at people already working on neural representations for tomography or other inverse problems who need to scale to 3D. A reader who has tried INR training and hit the memory wall will see immediate utility in the method. It is worth sending to peer review because the problem is real, the fix is concrete, and the 3D feasibility result is worth checking even if the quantitative claims need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes a memory-efficient optimization method for implicit neural representations (INRs) in CT reconstruction. It decomposes the gradient computation into a Jacobian-vector product (JVP) that admits stochastic subsampling, allowing a tunable trade-off between GPU memory and gradient accuracy. Experiments on synthetic 2D data are reported to achieve comparable convergence behavior and mean squared error (MSE) to standard autograd-based INR training at substantially lower memory cost, with an additional demonstration that the approach enables 3D cone-beam CT reconstruction under sparse-view conditions.

Significance. If the empirical claims hold, the work addresses a key practical barrier to deploying INRs for high-resolution 3D CT by reducing the memory footprint of ray-projection simulations without altering the underlying INR model. The direct use of Jacobian-vector products for stochastic gradient estimation is a clean, non-circular extension of existing INR forward models and could generalize to other projection-based inverse problems. The absence of theoretical bias/variance bounds is offset by the potential for empirical validation in memory-constrained settings.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The claim that the stochastic JVP subsampling yields 'comparable' MSE and convergence to full-gradient training is load-bearing for the central contribution, yet the manuscript provides no tabulated MSE values, convergence curves with error bars, exact subsampling fractions, or ablation results on sampling rate versus reconstruction fidelity. Without these, it is impossible to assess whether the bias/variance of the gradient estimate remains low enough for the optimizer to reach equivalent quality.
[Experiments (3D results)] 3D reconstruction demonstration: The statement that the method 'allows for memory-efficient 3D cone beam CT reconstruction in a sparse-view setting' is presented without quantitative metrics (e.g., memory reduction factor, PSNR/SSIM values, or comparison to full-gradient or other baselines), making it difficult to evaluate whether the approximation scales when ray count and INR depth increase.

minor comments (2)

[Method] Method section: The description of the stochastic subsampling strategy would benefit from an explicit equation showing how the JVP is approximated (e.g., the sampling operator applied to the Jacobian) to improve reproducibility.
[Method] Notation: The distinction between the full gradient and the approximated gradient could be made clearer by consistent use of subscripts or hats throughout the derivations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and will revise the manuscript to incorporate additional quantitative details that strengthen the presentation of our empirical results.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The claim that the stochastic JVP subsampling yields 'comparable' MSE and convergence to full-gradient training is load-bearing for the central contribution, yet the manuscript provides no tabulated MSE values, convergence curves with error bars, exact subsampling fractions, or ablation results on sampling rate versus reconstruction fidelity. Without these, it is impossible to assess whether the bias/variance of the gradient estimate remains low enough for the optimizer to reach equivalent quality.

Authors: We agree that the current manuscript would benefit from more detailed quantitative reporting to support the comparability claim. In the revised version we will add tabulated MSE values comparing the stochastic JVP approach to full-gradient training, convergence curves that include error bars from multiple independent runs, the precise subsampling fractions used in each experiment, and an ablation study examining reconstruction fidelity as a function of sampling rate. These additions will allow readers to directly evaluate the bias and variance properties of the gradient estimator. revision: yes
Referee: [Experiments (3D results)] 3D reconstruction demonstration: The statement that the method 'allows for memory-efficient 3D cone beam CT reconstruction in a sparse-view setting' is presented without quantitative metrics (e.g., memory reduction factor, PSNR/SSIM values, or comparison to full-gradient or other baselines), making it difficult to evaluate whether the approximation scales when ray count and INR depth increase.

Authors: We acknowledge that the 3D demonstration section currently lacks the requested quantitative metrics. The revised manuscript will include explicit memory reduction factors, PSNR and SSIM values for the reconstructed volumes, and side-by-side comparisons against full-gradient INR training as well as other relevant baselines. These metrics will be reported for the sparse-view cone-beam setting to better illustrate scalability with increased ray count and network depth. revision: yes

Circularity Check

0 steps flagged

No circularity: method applies standard JVP and stochastic sampling to INR forward model

full rationale

The paper's core contribution is a memory-efficient gradient approximation obtained by rewriting the loss gradient as a Jacobian-vector product (JVP) that admits stochastic subsampling over rays or pixels. This is a direct algebraic decomposition of the existing INR projection operator followed by Monte-Carlo sampling; it does not define any quantity in terms of itself, rename a fitted parameter as a prediction, or rest the central claim on a self-citation chain. Experiments compare the resulting optimizer trajectories to full-gradient autograd on synthetic 2D data and demonstrate 3D feasibility; these are empirical validations, not tautological re-statements of the inputs. No load-bearing step reduces by construction to the paper's own fitted values or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that INRs can faithfully represent CT attenuation maps and that the projection operator admits a Jacobian-vector product decomposition amenable to subsampling. No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Implicit neural representations provide an accurate and differentiable model for CT images
Implicit in the use of INRs for reconstruction; if false the optimization target is ill-posed.

pith-pipeline@v0.9.0 · 5440 in / 1358 out tokens · 74232 ms · 2026-05-10T16:00:50.972501+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

CoIL: Coordinate-based internal learning for tomographic imaging,

Yu Sun, Jiaming Liu, Mingyang Xie, Brendt Wohlberg, and Ulugbek S Kamilov, “CoIL: Coordinate-based internal learning for tomographic imaging,”IEEE Transactions on Computational Imaging, vol. 7, pp. 1400–1412, 2021. FBP CGLS ground truth Hash Encoding SIREN FFNs LS w/ grad approx FLS w/ grad approx MSE = 2.80e-6 MSE = 1.18e-5 7.46e-6 MSE = 7.72e-6 3.69e-6 ...

work page 2021
[2]

NAF: neural attenuation fields for sparse-view CBCT reconstruction,

Ruyi Zha, Yanhao Zhang, and Hongdong Li, “NAF: neural attenuation fields for sparse-view CBCT reconstruction,” inInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 442–452

work page 2022
[3]

NeRP: implicit neural rep- resentation learning with prior embedding for sparsely sampled image reconstruction,

Liyue Shen, John Pauly, and Lei Xing, “NeRP: implicit neural rep- resentation learning with prior embedding for sparsely sampled image reconstruction,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 770–782, 2022

work page 2022
[4]

An attenuation field network for dedicated cone beam breast CT with short scan and offset detector geometry,

Zhiyang Fu, Hsin Wu Tseng, and Srinivasan Vedantham, “An attenuation field network for dedicated cone beam breast CT with short scan and offset detector geometry,”Scientific Reports, vol. 14, no. 1, pp. 319, 2024

work page 2024
[5]

Accelerated optimization of implicit neural representations for CT reconstruction,

Mahrokh Najaf and Gregory Ongie, “Accelerated optimization of implicit neural representations for CT reconstruction,” in2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). IEEE, 2025, pp. 1–5

work page 2025
[6]

Fourier features let networks learn high frequency functions in low dimensional domains,

Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich- Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,”Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020

work page 2020
[7]

Implicit neural representations with periodic activation functions,

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein, “Implicit neural representations with periodic activation functions,”Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473, 2020

work page 2020
[8]

Instant neural graphics primitives with a multiresolution hash encod- ing,

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller, “Instant neural graphics primitives with a multiresolution hash encod- ing,”ACM transactions on graphics (TOG), vol. 41, no. 4, pp. 1–15, 2022

work page 2022
[9]

Differentiable forward projector for x- ray computed tomography,

Hyojin Kim and Kyle Champley, “Differentiable forward projector for x- ray computed tomography,” inICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators

work page 2023
[10]

A cone- beam X-ray computed tomography data collection designed for machine learning,

Henri Der Sarkissian, Felix Lucka, Maureen Van Eijnatten, Giulia Colacicco, Sophia Bethany Coban, and Kees Joost Batenburg, “A cone- beam X-ray computed tomography data collection designed for machine learning,”Scientific data, vol. 6, no. 1, pp. 215, 2019

work page 2019

[1] [1]

CoIL: Coordinate-based internal learning for tomographic imaging,

Yu Sun, Jiaming Liu, Mingyang Xie, Brendt Wohlberg, and Ulugbek S Kamilov, “CoIL: Coordinate-based internal learning for tomographic imaging,”IEEE Transactions on Computational Imaging, vol. 7, pp. 1400–1412, 2021. FBP CGLS ground truth Hash Encoding SIREN FFNs LS w/ grad approx FLS w/ grad approx MSE = 2.80e-6 MSE = 1.18e-5 7.46e-6 MSE = 7.72e-6 3.69e-6 ...

work page 2021

[2] [2]

NAF: neural attenuation fields for sparse-view CBCT reconstruction,

Ruyi Zha, Yanhao Zhang, and Hongdong Li, “NAF: neural attenuation fields for sparse-view CBCT reconstruction,” inInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 442–452

work page 2022

[3] [3]

NeRP: implicit neural rep- resentation learning with prior embedding for sparsely sampled image reconstruction,

Liyue Shen, John Pauly, and Lei Xing, “NeRP: implicit neural rep- resentation learning with prior embedding for sparsely sampled image reconstruction,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 770–782, 2022

work page 2022

[4] [4]

An attenuation field network for dedicated cone beam breast CT with short scan and offset detector geometry,

Zhiyang Fu, Hsin Wu Tseng, and Srinivasan Vedantham, “An attenuation field network for dedicated cone beam breast CT with short scan and offset detector geometry,”Scientific Reports, vol. 14, no. 1, pp. 319, 2024

work page 2024

[5] [5]

Accelerated optimization of implicit neural representations for CT reconstruction,

Mahrokh Najaf and Gregory Ongie, “Accelerated optimization of implicit neural representations for CT reconstruction,” in2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). IEEE, 2025, pp. 1–5

work page 2025

[6] [6]

Fourier features let networks learn high frequency functions in low dimensional domains,

Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich- Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,”Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020

work page 2020

[7] [7]

Implicit neural representations with periodic activation functions,

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein, “Implicit neural representations with periodic activation functions,”Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473, 2020

work page 2020

[8] [8]

Instant neural graphics primitives with a multiresolution hash encod- ing,

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller, “Instant neural graphics primitives with a multiresolution hash encod- ing,”ACM transactions on graphics (TOG), vol. 41, no. 4, pp. 1–15, 2022

work page 2022

[9] [9]

Differentiable forward projector for x- ray computed tomography,

Hyojin Kim and Kyle Champley, “Differentiable forward projector for x- ray computed tomography,” inICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators

work page 2023

[10] [10]

A cone- beam X-ray computed tomography data collection designed for machine learning,

Henri Der Sarkissian, Felix Lucka, Maureen Van Eijnatten, Giulia Colacicco, Sophia Bethany Coban, and Kees Joost Batenburg, “A cone- beam X-ray computed tomography data collection designed for machine learning,”Scientific data, vol. 6, no. 1, pp. 215, 2019

work page 2019