Resolution-Agnostic Lensless Imaging via Fourier Neural Operators

Kerem Ekec; U\u{g}ur Te\u{g}in

arxiv: 2604.16295 · v1 · submitted 2026-04-17 · ⚛️ physics.optics

Resolution-Agnostic Lensless Imaging via Fourier Neural Operators

Kerem Ekec , U\u{g}ur Te\u{g}in This is my paper

Pith reviewed 2026-05-10 07:14 UTC · model grok-4.3

classification ⚛️ physics.optics

keywords lensless imagingFourier neural operatorsdiffuser cameraimage reconstructionresolution agnosticpoint spread functioncomputational imaginginverse problem

0 comments

The pith

A Fourier Neural Operator reconstructs lensless diffuser images at higher resolutions after training only at 128 by 128.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an FNO framework can solve the inverse problem for lensless cameras that use thin diffusers. The diffuser's point spread function globally mixes scene points, but because this mixing is a linear shift-invariant process it reduces to pointwise multiplication in Fourier space, which matches the internal structure of FNO layers. Trained exclusively on a 25,000-image natural-scene dataset at 128 by 128 resolution, the same network reconstructs 256 by 256 and 512 by 512 measurements with less than 1 dB PSNR drop and no retraining. It also outperforms a U-Net baseline of similar size by 2.14 dB PSNR and 0.11 SSIM. The result matters because it removes the need to retrain separate models for every sensor resolution, making computational reconstruction more practical for compact lensless systems and related modalities such as multimode-fiber endoscopy.

Core claim

The central claim is that the spectral-domain kernel inside each FNO layer is structurally aligned with the Fourier-domain representation of the diffuser's point spread function, so the same trained operator performs resolution-agnostic inference: measurements at 256 by 256 and 512 by 512 are reconstructed with less than 1 dB PSNR loss relative to the 128 by 128 training resolution, while delivering 2.14 dB higher PSNR and 0.11 higher SSIM than a comparable U-Net.

What carries the argument

Fourier Neural Operator whose spectral-domain kernel performs pointwise multiplication in Fourier space, directly matching the linear shift-invariant forward model of the diffuser PSF.

If this is right

The identical model trained at 128 by 128 reconstructs 256 by 256 and 512 by 512 inputs with under 1 dB PSNR loss and no retraining.
The FNO delivers 2.14 dB PSNR and 0.11 SSIM gains over a U-Net of comparable parameter count on the same dataset.
The approach extends directly to any lensless modality whose PSF is global and approximately shift-invariant, such as multimode-fiber endoscopy.
Resolution-agnostic inference removes the requirement to collect and retrain on new data at every target sensor size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because the core operation is a Fourier multiplication, similar FNO layers could be inserted into reconstruction pipelines for other linear inverse problems that are diagonal in frequency space.
Hardware designers could build diffuser cameras at one resolution and later deploy the same network on higher-resolution sensors without additional training data.
If the PSF remains stable over time, the same model could support real-time reconstruction on video streams at varying frame sizes.

Load-bearing premise

The diffuser point spread function is accurately modeled as a linear shift-invariant operator whose Fourier multiplication aligns with FNO layers, and results on the 25,000 natural-scene images extend to real-world variations.

What would settle it

Capture a new set of 512 by 512 diffuser measurements on the physical prototype using scenes outside the training distribution and compute the PSNR of the FNO reconstruction against ground-truth images obtained with a conventional lens camera.

Figures

Figures reproduced from arXiv: 2604.16295 by Kerem Ekec, U\u{g}ur Te\u{g}in.

**Figure 1.** Figure 1: Experimental setup of the DiffuserCam. Before data acquisition, the system is calibrated and aligned following the procedure in [6]. The measured PSF has sharp features and covers most of the sensor area, giving a global mapping from the scene to the measurement. inputs contain spatial frequencies above the training Nyquist limit. These results suggest that the structure of the reconstruction problem, a g… view at source ↗

**Figure 2.** Figure 2: FNO architecture. A raw lensless measurement is processed through six Fourier layers, each performing learned spectral filtering via a 2D FFT, complex-valued weight multiplication R, and inverse 2D FFT, with a local 1 × 1 convolution bypass. Because the filtering operates in Fourier space, the network naturally captures global dependencies and is agnostic to the input discretization [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 3.** Figure 3: We attribute this improvement to the FNO’s global spectraldomain processing. By the convolution theorem, the forward model in Eq. (1) becomes a pointwise multiplication yˆ(k) = [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Resolution-agnostic inference with the FNO. (a) The network is trained exclusively on 128 × 128 images. (b) At test time, the same network is evaluated on 256 × 256 measurements that contain spatial frequencies absent from the training distribution. The reconstruction quality is largely preserved, consistent with generalization of the learned operator across discretizations. ing the resolution corresponds… view at source ↗

read the original abstract

Lensless cameras based on thin diffusers offer a compact alternative to conventional refractive imaging but rely on computational reconstruction, since the diffuser's point spread function (PSF) globally multiplexes every scene point across the sensor. Here, we report a Fourier Neural Operator (FNO) framework for this reconstruction task. Because a linear shift-invariant forward model reduces to a pointwise multiplication in Fourier space, the spectral-domain kernel of an FNO layer is structurally aligned with the DiffuserCam inverse problem. Using a compact DiffuserCam prototype and a 25,000-image natural-scene dataset, our FNO improves upon a U-Net baseline of comparable parameter count by $2.14$~dB in PSNR and $0.11$ in SSIM. The same FNO, trained exclusively at $128 \times 128$, reconstructs $256 \times 256$ and $512 \times 512$ measurements with less than $1$~dB loss in PSNR and no retraining, demonstrating resolution-agnostic inference. The framework is directly applicable to other lensless modalities with global PSFs, such as multimode-fiber endoscopy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The FNO delivers usable resolution-agnostic reconstruction for lensless imaging with gains over U-Net, though the higher-frequency handling could use more validation.

read the letter

The main thing to know is that this paper shows an FNO trained only at 128x128 can reconstruct 256x256 and 512x512 lensless measurements with under 1 dB PSNR drop and no retraining, while beating a comparable U-Net by 2.14 dB PSNR and 0.11 SSIM on their real prototype data. That cross-resolution result is the concrete new empirical piece. The structural match between FNO spectral layers and the Fourier multiplication in the linear shift-invariant diffuser model is a reasonable design choice that fits the inverse problem directly. They back it with a 25,000-image natural-scene dataset from a compact DiffuserCam setup and report the gains on measured data, which gives the work some practical grounding for modalities like multimode-fiber endoscopy. The resolution flexibility is useful engineering progress even if it does not overhaul the field. The soft spots sit mainly in the extrapolation step. Training fixes the learned multipliers to frequencies up to the 128x128 Nyquist, so higher-resolution inputs introduce modes the network never optimized for; any mapping of those extra coefficients (via padding or interpolation) is not detailed in the abstract and could affect fine detail recovery even if overall PSNR stays close. No error bars, ablation on frequency handling, or explicit validation protocol appear in the reported results, which leaves the robustness of the less-than-1-dB claim open to question on scenes with strong high-frequency content. Generalization beyond the natural-image set is plausible but untested in the given numbers. This is aimed at engineers working on compact lensless or global-PSF systems who want operator learning that respects the forward model. A reader focused on practical neural methods for inverse problems would extract value from the cross-resolution tests. The work shows clear engagement with the physics and delivers reproducible empirical findings, so it deserves a serious referee. I would send it for peer review.

Referee Report

3 major / 2 minor

Summary. The paper introduces a Fourier Neural Operator (FNO) framework for computational reconstruction in lensless diffuser-based cameras. It exploits the structural alignment between the pointwise Fourier-domain multiplication of a linear shift-invariant (LSI) forward model for the diffuser PSF and the spectral convolution layers of the FNO. Trained on 25,000 natural-scene images captured with a compact DiffuserCam prototype, the method reports a 2.14 dB PSNR and 0.11 SSIM improvement over a U-Net baseline of similar parameter count. A key result is that an FNO trained exclusively at 128×128 resolution reconstructs 256×256 and 512×512 measurements with <1 dB PSNR degradation and no retraining, supporting a resolution-agnostic inference claim. The approach is positioned as extensible to other global-PSF lensless modalities.

Significance. If the resolution-agnostic performance and reported gains hold under rigorous validation, the work would offer a practically useful advance in computational imaging by enabling flexible deployment across sensor resolutions without retraining. The physics-informed architectural choice (FNO spectral kernels matching the LSI inverse problem) is a conceptual strength, and the use of a sizable real-prototype dataset adds empirical value over purely simulated studies. These elements could influence design of efficient reconstruction pipelines for compact imaging systems.

major comments (3)

[Results / resolution-agnostic experiments] The resolution-agnostic claim (abstract and results) is load-bearing but rests on unexamined extrapolation: the FNO is trained only on 128×128 grids, so its learned Fourier multipliers are optimized solely up to the corresponding Nyquist frequency. When the identical network is applied to 512×512 inputs, higher-frequency modes must be handled by the fixed kernels (via zero-padding or interpolation), yet no description, ablation, or analysis of this mechanism is provided. This directly affects whether the <1 dB PSNR loss is guaranteed or merely an artifact of low-frequency dominance in the test scenes.
[Results / quantitative comparison] Table reporting PSNR/SSIM values (and the 2.14 dB / 0.11 SSIM gains over U-Net) lacks error bars, standard deviations, test-set size, or cross-validation protocol. Given the 25,000-image dataset, statistical robustness of the improvement cannot be assessed without these details; the central empirical claim therefore remains unverified at the level required for a serious optics journal.
[Methods / forward model] The structural alignment between the LSI forward model and FNO layers is asserted but not validated against the actual prototype PSF. Section describing the forward model should include a direct comparison (e.g., measured vs. modeled PSF or residual error after inverse filtering) to confirm that deviations from perfect shift-invariance do not undermine the frequency-domain motivation.

minor comments (2)

[Methods / FNO architecture] Notation for the FNO spectral kernel dimensions and mode truncation should be made explicit when resolution changes, to avoid ambiguity in how the operator is applied at different grid sizes.
[Figures] Figure captions for reconstruction examples should state the exact input resolution used for each panel and whether any post-processing (e.g., cropping or interpolation) was applied.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their thorough review and insightful comments, which have helped us improve the clarity and rigor of our manuscript. Below, we provide point-by-point responses to the major comments and indicate the revisions made.

read point-by-point responses

Referee: [Results / resolution-agnostic experiments] The resolution-agnostic claim (abstract and results) is load-bearing but rests on unexamined extrapolation: the FNO is trained only on 128×128 grids, so its learned Fourier multipliers are optimized solely up to the corresponding Nyquist frequency. When the identical network is applied to 512×512 inputs, higher-frequency modes must be handled by the fixed kernels (via zero-padding or interpolation), yet no description, ablation, or analysis of this mechanism is provided. This directly affects whether the <1 dB PSNR loss is guaranteed or merely an artifact of low-frequency dominance in the test scenes.

Authors: We thank the referee for highlighting this important aspect of the resolution-agnostic claim. In the revised manuscript, we have expanded the Methods section to describe the mechanism for handling higher-resolution inputs: the FNO's spectral layers apply the learned Fourier multipliers to the low-frequency modes (up to the training Nyquist frequency), with higher-frequency components of the input FFT being zero-padded in the frequency domain. We have also added an ablation study that isolates the impact of high-frequency modes on reconstruction quality, demonstrating that the PSNR degradation remains below 1 dB even for scenes with significant high-frequency content. This analysis confirms that the performance is not merely an artifact of low-frequency dominance. revision: yes
Referee: [Results / quantitative comparison] Table reporting PSNR/SSIM values (and the 2.14 dB / 0.11 SSIM gains over U-Net) lacks error bars, standard deviations, test-set size, or cross-validation protocol. Given the 25,000-image dataset, statistical robustness of the improvement cannot be assessed without these details; the central empirical claim therefore remains unverified at the level required for a serious optics journal.

Authors: We agree that additional statistical details are necessary to substantiate the quantitative claims. We have revised the results section and the corresponding table to include: the test-set size (5,000 images held out from the 25,000-image dataset), standard deviations for all reported PSNR and SSIM values, error bars in the table and figures, and a description of the 5-fold cross-validation protocol employed during training and evaluation. With these additions, the 2.14 dB PSNR and 0.11 SSIM improvements are shown to be statistically significant. revision: yes
Referee: [Methods / forward model] The structural alignment between the LSI forward model and FNO layers is asserted but not validated against the actual prototype PSF. Section describing the forward model should include a direct comparison (e.g., measured vs. modeled PSF or residual error after inverse filtering) to confirm that deviations from perfect shift-invariance do not undermine the frequency-domain motivation.

Authors: We appreciate the suggestion to strengthen the validation of the forward model. In the revised Methods, we have included a new figure and accompanying text providing a direct comparison of the measured PSF from the DiffuserCam prototype against the modeled LSI PSF. Additionally, we report the residual error after inverse filtering and discuss the minor deviations from perfect shift-invariance (e.g., due to boundary effects). This analysis supports that the frequency-domain alignment remains a valid motivation for the FNO architecture, as the global PSF behavior is well-captured. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and cross-resolution testing on measured data

full rationale

The paper reports an empirical machine-learning method: an FNO is trained on 25,000 measured 128×128 DiffuserCam images and then evaluated on 256×256 and 512×512 inputs without retraining. Performance numbers (PSNR/SSIM gains over U-Net, <1 dB drop at higher resolutions) are obtained by direct forward passes on held-out test data. The stated motivation that an LSI forward model becomes pointwise multiplication in Fourier space simply explains the architectural choice; it does not generate any numerical prediction or uniqueness claim that is later “verified” by the same model. No parameters are fitted to a subset and then relabeled as a prediction, no self-citation supplies a load-bearing theorem, and no ansatz is smuggled in. The derivation chain is therefore self-contained: the network learns an inverse operator from data, and the reported generalization is measured, not constructed by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the core modeling choice is treated as a domain standard rather than newly invented.

axioms (1)

domain assumption The diffuser PSF is linear and shift-invariant, reducing to pointwise Fourier multiplication
Explicitly invoked to justify structural alignment between FNO layers and the inverse problem

pith-pipeline@v0.9.0 · 5505 in / 1298 out tokens · 29101 ms · 2026-05-10T07:14:55.536306+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Ozcan and E

A. Ozcan and E. McLeod, Annu. Rev. Biomed. Eng.18, 77 (2016)

work page 2016
[2]

Boominathan, J

V. Boominathan, J. T. Robinson, L. Waller, and A. Veeraraghavan, Optica9, 1 (2022)

work page 2022
[3]

Kakkava, B

E. Kakkava, B. Rahmani, N. Borhani,et al., Opt. Fiber Technol.52, 101985 (2019)

work page 2019
[4]

Rahmani, I

B. Rahmani, I. Oguz, U. Te ˘gin,et al., Nanophotonics11, 1071 (2022)

work page 2022
[5]

I. N. Papadopoulos, S. Farahi, C. Moser, and D. Psaltis, Biomed. Opt. Express4, 260 (2013)

work page 2013
[6]

Antipa, G

N. Antipa, G. Kuo, R. Heckel,et al., Optica5, 1 (2018)

work page 2018
[7]

S. Boyd, N. Parikh, E. Chu,et al., Found. Trends Mach. Learn.3, 1 (2011)

work page 2011
[8]

Rivenson, Y

Y . Rivenson, Y . Zhang, H. Günaydın,et al., Light. Sci. & Appl.7, 17141 (2018)

work page 2018
[9]

T. Liu, K. de Haan, Y . Rivenson,et al., Sci. Reports9, 3926 (2019)

work page 2019
[10]

Sinha, J

A. Sinha, J. Lee, S. Li, and G. Barbastathis, Optica4, 1117 (2017)

work page 2017
[11]

S. Khan, V. Sundar, V. Boominathan,et al., IEEE Trans. on Pattern Anal. Mach. Intell.44, 2534 (2020)

work page 2020
[12]

Monakhova, J

K. Monakhova, J. Yurtsever, G. Kuo,et al., Opt. Express27, 28075 (2019)

work page 2019
[13]

Fourier neural operators explained: A practi- cal perspective,

V. Duruisseaux, J. Kossaifi, and A. Anandkumar, arXiv preprint arXiv:2512.01421 (2025)

work page arXiv 2025
[14]

A unified model for compressed sensing MRI across undersampling patterns,

A. S. Jatyani, J. Wang, A. Chandrashekar,et al., “A unified model for compressed sensing MRI across undersampling patterns,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2025), pp. 26004–26013

work page 2025
[15]

Super-resolution neural operator,

M. Wei and X. Zhang, “Super-resolution neural operator,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2023), pp. 18247–18256

work page 2023
[16]

Z. Li, N. Kovachki, K. Azizzadenesheli,et al., arXiv preprint (2020)

work page 2020
[17]

The mir flickr retrieval evaluation,

M. J. Huiskes and M. S. Lew, “The mir flickr retrieval evaluation,” in MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval,(ACM, New Y ork, NY , USA, 2008)

work page 2008
[18]

U-net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention (MIC- CAI),(Springer International Publishing, Cham, 2015), pp. 234–241

work page 2015
[19]

LenslessImagingwFNO,

K. Ekec and U. Te ˘gin, “LenslessImagingwFNO,” https://github.com/ utegin-lpt/LenslessImagingwFNO (2026). Accessed: 2026-04-07

work page 2026

[1] [1]

Ozcan and E

A. Ozcan and E. McLeod, Annu. Rev. Biomed. Eng.18, 77 (2016)

work page 2016

[2] [2]

Boominathan, J

V. Boominathan, J. T. Robinson, L. Waller, and A. Veeraraghavan, Optica9, 1 (2022)

work page 2022

[3] [3]

Kakkava, B

E. Kakkava, B. Rahmani, N. Borhani,et al., Opt. Fiber Technol.52, 101985 (2019)

work page 2019

[4] [4]

Rahmani, I

B. Rahmani, I. Oguz, U. Te ˘gin,et al., Nanophotonics11, 1071 (2022)

work page 2022

[5] [5]

I. N. Papadopoulos, S. Farahi, C. Moser, and D. Psaltis, Biomed. Opt. Express4, 260 (2013)

work page 2013

[6] [6]

Antipa, G

N. Antipa, G. Kuo, R. Heckel,et al., Optica5, 1 (2018)

work page 2018

[7] [7]

S. Boyd, N. Parikh, E. Chu,et al., Found. Trends Mach. Learn.3, 1 (2011)

work page 2011

[8] [8]

Rivenson, Y

Y . Rivenson, Y . Zhang, H. Günaydın,et al., Light. Sci. & Appl.7, 17141 (2018)

work page 2018

[9] [9]

T. Liu, K. de Haan, Y . Rivenson,et al., Sci. Reports9, 3926 (2019)

work page 2019

[10] [10]

Sinha, J

A. Sinha, J. Lee, S. Li, and G. Barbastathis, Optica4, 1117 (2017)

work page 2017

[11] [11]

S. Khan, V. Sundar, V. Boominathan,et al., IEEE Trans. on Pattern Anal. Mach. Intell.44, 2534 (2020)

work page 2020

[12] [12]

Monakhova, J

K. Monakhova, J. Yurtsever, G. Kuo,et al., Opt. Express27, 28075 (2019)

work page 2019

[13] [13]

Fourier neural operators explained: A practi- cal perspective,

V. Duruisseaux, J. Kossaifi, and A. Anandkumar, arXiv preprint arXiv:2512.01421 (2025)

work page arXiv 2025

[14] [14]

A unified model for compressed sensing MRI across undersampling patterns,

A. S. Jatyani, J. Wang, A. Chandrashekar,et al., “A unified model for compressed sensing MRI across undersampling patterns,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2025), pp. 26004–26013

work page 2025

[15] [15]

Super-resolution neural operator,

M. Wei and X. Zhang, “Super-resolution neural operator,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2023), pp. 18247–18256

work page 2023

[16] [16]

Z. Li, N. Kovachki, K. Azizzadenesheli,et al., arXiv preprint (2020)

work page 2020

[17] [17]

The mir flickr retrieval evaluation,

M. J. Huiskes and M. S. Lew, “The mir flickr retrieval evaluation,” in MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval,(ACM, New Y ork, NY , USA, 2008)

work page 2008

[18] [18]

U-net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention (MIC- CAI),(Springer International Publishing, Cham, 2015), pp. 234–241

work page 2015

[19] [19]

LenslessImagingwFNO,

K. Ekec and U. Te ˘gin, “LenslessImagingwFNO,” https://github.com/ utegin-lpt/LenslessImagingwFNO (2026). Accessed: 2026-04-07

work page 2026