pith. sign in

arxiv: 2604.16295 · v1 · submitted 2026-04-17 · ⚛️ physics.optics

Resolution-Agnostic Lensless Imaging via Fourier Neural Operators

Pith reviewed 2026-05-10 07:14 UTC · model grok-4.3

classification ⚛️ physics.optics
keywords lensless imagingFourier neural operatorsdiffuser cameraimage reconstructionresolution agnosticpoint spread functioncomputational imaginginverse problem
0
0 comments X

The pith

A Fourier Neural Operator reconstructs lensless diffuser images at higher resolutions after training only at 128 by 128.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an FNO framework can solve the inverse problem for lensless cameras that use thin diffusers. The diffuser's point spread function globally mixes scene points, but because this mixing is a linear shift-invariant process it reduces to pointwise multiplication in Fourier space, which matches the internal structure of FNO layers. Trained exclusively on a 25,000-image natural-scene dataset at 128 by 128 resolution, the same network reconstructs 256 by 256 and 512 by 512 measurements with less than 1 dB PSNR drop and no retraining. It also outperforms a U-Net baseline of similar size by 2.14 dB PSNR and 0.11 SSIM. The result matters because it removes the need to retrain separate models for every sensor resolution, making computational reconstruction more practical for compact lensless systems and related modalities such as multimode-fiber endoscopy.

Core claim

The central claim is that the spectral-domain kernel inside each FNO layer is structurally aligned with the Fourier-domain representation of the diffuser's point spread function, so the same trained operator performs resolution-agnostic inference: measurements at 256 by 256 and 512 by 512 are reconstructed with less than 1 dB PSNR loss relative to the 128 by 128 training resolution, while delivering 2.14 dB higher PSNR and 0.11 higher SSIM than a comparable U-Net.

What carries the argument

Fourier Neural Operator whose spectral-domain kernel performs pointwise multiplication in Fourier space, directly matching the linear shift-invariant forward model of the diffuser PSF.

If this is right

  • The identical model trained at 128 by 128 reconstructs 256 by 256 and 512 by 512 inputs with under 1 dB PSNR loss and no retraining.
  • The FNO delivers 2.14 dB PSNR and 0.11 SSIM gains over a U-Net of comparable parameter count on the same dataset.
  • The approach extends directly to any lensless modality whose PSF is global and approximately shift-invariant, such as multimode-fiber endoscopy.
  • Resolution-agnostic inference removes the requirement to collect and retrain on new data at every target sensor size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the core operation is a Fourier multiplication, similar FNO layers could be inserted into reconstruction pipelines for other linear inverse problems that are diagonal in frequency space.
  • Hardware designers could build diffuser cameras at one resolution and later deploy the same network on higher-resolution sensors without additional training data.
  • If the PSF remains stable over time, the same model could support real-time reconstruction on video streams at varying frame sizes.

Load-bearing premise

The diffuser point spread function is accurately modeled as a linear shift-invariant operator whose Fourier multiplication aligns with FNO layers, and results on the 25,000 natural-scene images extend to real-world variations.

What would settle it

Capture a new set of 512 by 512 diffuser measurements on the physical prototype using scenes outside the training distribution and compute the PSNR of the FNO reconstruction against ground-truth images obtained with a conventional lens camera.

Figures

Figures reproduced from arXiv: 2604.16295 by Kerem Ekec, U\u{g}ur Te\u{g}in.

Figure 1
Figure 1. Figure 1: Experimental setup of the DiffuserCam. Before data acquisition, the system is calibrated and aligned following the procedure in [6]. The measured PSF has sharp features and covers most of the sensor area, giving a global mapping from the scene to the measurement. inputs contain spatial frequencies above the training Nyquist limit. These results suggest that the structure of the reconstruc￾tion problem, a g… view at source ↗
Figure 2
Figure 2. Figure 2: FNO architecture. A raw lensless measurement is pro￾cessed through six Fourier layers, each performing learned spectral filtering via a 2D FFT, complex-valued weight multi￾plication R, and inverse 2D FFT, with a local 1 × 1 convolution bypass. Because the filtering operates in Fourier space, the net￾work naturally captures global dependencies and is agnostic to the input discretization [PITH_FULL_IMAGE:fi… view at source ↗
Figure 3
Figure 3. Figure 3: We attribute this improvement to the FNO’s global spectral￾domain processing. By the convolution theorem, the forward model in Eq. (1) becomes a pointwise multiplication yˆ(k) = [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Resolution-agnostic inference with the FNO. (a) The network is trained exclusively on 128 × 128 images. (b) At test time, the same network is evaluated on 256 × 256 measure￾ments that contain spatial frequencies absent from the training distribution. The reconstruction quality is largely preserved, consistent with generalization of the learned operator across discretizations. ing the resolution corresponds… view at source ↗
read the original abstract

Lensless cameras based on thin diffusers offer a compact alternative to conventional refractive imaging but rely on computational reconstruction, since the diffuser's point spread function (PSF) globally multiplexes every scene point across the sensor. Here, we report a Fourier Neural Operator (FNO) framework for this reconstruction task. Because a linear shift-invariant forward model reduces to a pointwise multiplication in Fourier space, the spectral-domain kernel of an FNO layer is structurally aligned with the DiffuserCam inverse problem. Using a compact DiffuserCam prototype and a 25,000-image natural-scene dataset, our FNO improves upon a U-Net baseline of comparable parameter count by $2.14$~dB in PSNR and $0.11$ in SSIM. The same FNO, trained exclusively at $128 \times 128$, reconstructs $256 \times 256$ and $512 \times 512$ measurements with less than $1$~dB loss in PSNR and no retraining, demonstrating resolution-agnostic inference. The framework is directly applicable to other lensless modalities with global PSFs, such as multimode-fiber endoscopy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a Fourier Neural Operator (FNO) framework for computational reconstruction in lensless diffuser-based cameras. It exploits the structural alignment between the pointwise Fourier-domain multiplication of a linear shift-invariant (LSI) forward model for the diffuser PSF and the spectral convolution layers of the FNO. Trained on 25,000 natural-scene images captured with a compact DiffuserCam prototype, the method reports a 2.14 dB PSNR and 0.11 SSIM improvement over a U-Net baseline of similar parameter count. A key result is that an FNO trained exclusively at 128×128 resolution reconstructs 256×256 and 512×512 measurements with <1 dB PSNR degradation and no retraining, supporting a resolution-agnostic inference claim. The approach is positioned as extensible to other global-PSF lensless modalities.

Significance. If the resolution-agnostic performance and reported gains hold under rigorous validation, the work would offer a practically useful advance in computational imaging by enabling flexible deployment across sensor resolutions without retraining. The physics-informed architectural choice (FNO spectral kernels matching the LSI inverse problem) is a conceptual strength, and the use of a sizable real-prototype dataset adds empirical value over purely simulated studies. These elements could influence design of efficient reconstruction pipelines for compact imaging systems.

major comments (3)
  1. [Results / resolution-agnostic experiments] The resolution-agnostic claim (abstract and results) is load-bearing but rests on unexamined extrapolation: the FNO is trained only on 128×128 grids, so its learned Fourier multipliers are optimized solely up to the corresponding Nyquist frequency. When the identical network is applied to 512×512 inputs, higher-frequency modes must be handled by the fixed kernels (via zero-padding or interpolation), yet no description, ablation, or analysis of this mechanism is provided. This directly affects whether the <1 dB PSNR loss is guaranteed or merely an artifact of low-frequency dominance in the test scenes.
  2. [Results / quantitative comparison] Table reporting PSNR/SSIM values (and the 2.14 dB / 0.11 SSIM gains over U-Net) lacks error bars, standard deviations, test-set size, or cross-validation protocol. Given the 25,000-image dataset, statistical robustness of the improvement cannot be assessed without these details; the central empirical claim therefore remains unverified at the level required for a serious optics journal.
  3. [Methods / forward model] The structural alignment between the LSI forward model and FNO layers is asserted but not validated against the actual prototype PSF. Section describing the forward model should include a direct comparison (e.g., measured vs. modeled PSF or residual error after inverse filtering) to confirm that deviations from perfect shift-invariance do not undermine the frequency-domain motivation.
minor comments (2)
  1. [Methods / FNO architecture] Notation for the FNO spectral kernel dimensions and mode truncation should be made explicit when resolution changes, to avoid ambiguity in how the operator is applied at different grid sizes.
  2. [Figures] Figure captions for reconstruction examples should state the exact input resolution used for each panel and whether any post-processing (e.g., cropping or interpolation) was applied.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their thorough review and insightful comments, which have helped us improve the clarity and rigor of our manuscript. Below, we provide point-by-point responses to the major comments and indicate the revisions made.

read point-by-point responses
  1. Referee: [Results / resolution-agnostic experiments] The resolution-agnostic claim (abstract and results) is load-bearing but rests on unexamined extrapolation: the FNO is trained only on 128×128 grids, so its learned Fourier multipliers are optimized solely up to the corresponding Nyquist frequency. When the identical network is applied to 512×512 inputs, higher-frequency modes must be handled by the fixed kernels (via zero-padding or interpolation), yet no description, ablation, or analysis of this mechanism is provided. This directly affects whether the <1 dB PSNR loss is guaranteed or merely an artifact of low-frequency dominance in the test scenes.

    Authors: We thank the referee for highlighting this important aspect of the resolution-agnostic claim. In the revised manuscript, we have expanded the Methods section to describe the mechanism for handling higher-resolution inputs: the FNO's spectral layers apply the learned Fourier multipliers to the low-frequency modes (up to the training Nyquist frequency), with higher-frequency components of the input FFT being zero-padded in the frequency domain. We have also added an ablation study that isolates the impact of high-frequency modes on reconstruction quality, demonstrating that the PSNR degradation remains below 1 dB even for scenes with significant high-frequency content. This analysis confirms that the performance is not merely an artifact of low-frequency dominance. revision: yes

  2. Referee: [Results / quantitative comparison] Table reporting PSNR/SSIM values (and the 2.14 dB / 0.11 SSIM gains over U-Net) lacks error bars, standard deviations, test-set size, or cross-validation protocol. Given the 25,000-image dataset, statistical robustness of the improvement cannot be assessed without these details; the central empirical claim therefore remains unverified at the level required for a serious optics journal.

    Authors: We agree that additional statistical details are necessary to substantiate the quantitative claims. We have revised the results section and the corresponding table to include: the test-set size (5,000 images held out from the 25,000-image dataset), standard deviations for all reported PSNR and SSIM values, error bars in the table and figures, and a description of the 5-fold cross-validation protocol employed during training and evaluation. With these additions, the 2.14 dB PSNR and 0.11 SSIM improvements are shown to be statistically significant. revision: yes

  3. Referee: [Methods / forward model] The structural alignment between the LSI forward model and FNO layers is asserted but not validated against the actual prototype PSF. Section describing the forward model should include a direct comparison (e.g., measured vs. modeled PSF or residual error after inverse filtering) to confirm that deviations from perfect shift-invariance do not undermine the frequency-domain motivation.

    Authors: We appreciate the suggestion to strengthen the validation of the forward model. In the revised Methods, we have included a new figure and accompanying text providing a direct comparison of the measured PSF from the DiffuserCam prototype against the modeled LSI PSF. Additionally, we report the residual error after inverse filtering and discuss the minor deviations from perfect shift-invariance (e.g., due to boundary effects). This analysis supports that the frequency-domain alignment remains a valid motivation for the FNO architecture, as the global PSF behavior is well-captured. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and cross-resolution testing on measured data

full rationale

The paper reports an empirical machine-learning method: an FNO is trained on 25,000 measured 128×128 DiffuserCam images and then evaluated on 256×256 and 512×512 inputs without retraining. Performance numbers (PSNR/SSIM gains over U-Net, <1 dB drop at higher resolutions) are obtained by direct forward passes on held-out test data. The stated motivation that an LSI forward model becomes pointwise multiplication in Fourier space simply explains the architectural choice; it does not generate any numerical prediction or uniqueness claim that is later “verified” by the same model. No parameters are fitted to a subset and then relabeled as a prediction, no self-citation supplies a load-bearing theorem, and no ansatz is smuggled in. The derivation chain is therefore self-contained: the network learns an inverse operator from data, and the reported generalization is measured, not constructed by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the core modeling choice is treated as a domain standard rather than newly invented.

axioms (1)
  • domain assumption The diffuser PSF is linear and shift-invariant, reducing to pointwise Fourier multiplication
    Explicitly invoked to justify structural alignment between FNO layers and the inverse problem

pith-pipeline@v0.9.0 · 5505 in / 1298 out tokens · 29101 ms · 2026-05-10T07:14:55.536306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Ozcan and E

    A. Ozcan and E. McLeod, Annu. Rev. Biomed. Eng.18, 77 (2016)

  2. [2]

    Boominathan, J

    V. Boominathan, J. T. Robinson, L. Waller, and A. Veeraraghavan, Optica9, 1 (2022)

  3. [3]

    Kakkava, B

    E. Kakkava, B. Rahmani, N. Borhani,et al., Opt. Fiber Technol.52, 101985 (2019)

  4. [4]

    Rahmani, I

    B. Rahmani, I. Oguz, U. Te ˘gin,et al., Nanophotonics11, 1071 (2022)

  5. [5]

    I. N. Papadopoulos, S. Farahi, C. Moser, and D. Psaltis, Biomed. Opt. Express4, 260 (2013)

  6. [6]

    Antipa, G

    N. Antipa, G. Kuo, R. Heckel,et al., Optica5, 1 (2018)

  7. [7]

    S. Boyd, N. Parikh, E. Chu,et al., Found. Trends Mach. Learn.3, 1 (2011)

  8. [8]

    Rivenson, Y

    Y . Rivenson, Y . Zhang, H. Günaydın,et al., Light. Sci. & Appl.7, 17141 (2018)

  9. [9]

    T. Liu, K. de Haan, Y . Rivenson,et al., Sci. Reports9, 3926 (2019)

  10. [10]

    Sinha, J

    A. Sinha, J. Lee, S. Li, and G. Barbastathis, Optica4, 1117 (2017)

  11. [11]

    S. Khan, V. Sundar, V. Boominathan,et al., IEEE Trans. on Pattern Anal. Mach. Intell.44, 2534 (2020)

  12. [12]

    Monakhova, J

    K. Monakhova, J. Yurtsever, G. Kuo,et al., Opt. Express27, 28075 (2019)

  13. [13]

    Fourier neural operators explained: A practi- cal perspective,

    V. Duruisseaux, J. Kossaifi, and A. Anandkumar, arXiv preprint arXiv:2512.01421 (2025)

  14. [14]

    A unified model for compressed sensing MRI across undersampling patterns,

    A. S. Jatyani, J. Wang, A. Chandrashekar,et al., “A unified model for compressed sensing MRI across undersampling patterns,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2025), pp. 26004–26013

  15. [15]

    Super-resolution neural operator,

    M. Wei and X. Zhang, “Super-resolution neural operator,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2023), pp. 18247–18256

  16. [16]

    Z. Li, N. Kovachki, K. Azizzadenesheli,et al., arXiv preprint (2020)

  17. [17]

    The mir flickr retrieval evaluation,

    M. J. Huiskes and M. S. Lew, “The mir flickr retrieval evaluation,” in MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval,(ACM, New Y ork, NY , USA, 2008)

  18. [18]

    U-net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention (MIC- CAI),(Springer International Publishing, Cham, 2015), pp. 234–241

  19. [19]

    LenslessImagingwFNO,

    K. Ekec and U. Te ˘gin, “LenslessImagingwFNO,” https://github.com/ utegin-lpt/LenslessImagingwFNO (2026). Accessed: 2026-04-07