pith. sign in

arxiv: 2604.14901 · v4 · pith:HWUYMHNVnew · submitted 2026-04-16 · ⚛️ physics.optics

End-to-End Inverse Designed Single-Layered Metasurface for Snapshot RGB-Achromatic Full-Stokes Polarization Imaging

Pith reviewed 2026-05-10 10:38 UTC · model grok-4.3

classification ⚛️ physics.optics
keywords metasurfacefull-Stokes polarimetrysnapshot imagingend-to-end designinverse designachromatic imagingpolarization imagingoptical-digital co-design
0
0 comments X

The pith

End-to-end co-design of metasurface optics and neural reconstruction enables snapshot RGB-achromatic full-Stokes polarization imaging from a single monochrome measurement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a joint optical-digital optimization framework that encodes full-Stokes polarization information into a single sensor capture using an inverse-designed metasurface. A differentiable 4f frontend models the metasurface via an MLP while a U-Net reconstructs the RGB Stokes images, allowing the system to be trained end-to-end. Demonstrations on real datasets yield 26.71 dB PSNR and 0.7044 SSIM for the hybrid architecture in achromatic RGB mode and slightly lower figures for the pure meta-optic version, all at a compression ratio of 12. A sympathetic reader cares because this replaces bulky, multi-shot conventional polarimeters with a compact alternative suitable for dynamic or resource-limited applications.

Core claim

By modeling the metasurface response with a multilayer perceptron inside a differentiable 4f system and jointly optimizing it with a U-Net decoder, the authors reconstruct complete RGB full-Stokes images from one monochrome measurement, achieving 30.00 dB PSNR for monochromatic and 26.71 dB for achromatic RGB cases in the hybrid setup, and comparable performance in an all-meta-optic configuration.

What carries the argument

Differentiable 4f optical frontend whose metasurface is approximated by an MLP for polarization encoding, co-optimized with a U-Net reconstruction network.

If this is right

  • The hybrid metasurface-refractive architecture reaches 30.00 dB PSNR and 0.8291 SSIM for monochromatic visible imaging.
  • The pure meta-optic system attains 26.94 dB PSNR and 0.7184 SSIM monochromatically and 24.10 dB PSNR with 0.6015 SSIM in RGB-achromatic mode.
  • Full-Stokes polarization data is recovered at a compression ratio of 12 without requiring multiple measurements or bulky conventional optics.
  • Both implementations demonstrate snapshot operation on real-world datasets while maintaining achromatic performance across the visible band.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Compact meta-optic versions could be integrated into mobile or drone platforms for real-time polarization-based material or biological analysis.
  • The same co-design loop might be adapted to other compressive sensing tasks such as snapshot hyperspectral or depth imaging.
  • Performance gaps between hybrid and pure-meta results suggest that further physical constraints on the metasurface model could improve fabrication robustness.

Load-bearing premise

The MLP faithfully approximates the physical metasurface response and the joint optimization produces a design that generalizes from simulated training data to real-world scenes.

What would settle it

Fabricate the designed metasurface, measure its actual Stokes encoding on diverse real scenes under broadband illumination, and compare the U-Net reconstructions against ground-truth polarimetry to verify whether the reported PSNR and SSIM values are attained.

Figures

Figures reproduced from arXiv: 2604.14901 by Haining Yang, Jirong Bao, Mengdi Sun, Xingyu Chai.

Figure 1
Figure 1. Figure 1: The design process of the MLP-based surrogate model for the metasurface and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the proposed full-Stokes polarization imaging system [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of the MLP-based surrogate model based on the meta-atoms of the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The metalens performance in the monochromatic case at 0.4358 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Schematic overview of the U-Net used in the backend. The values above the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Loss function under different masks over training epochs in the monochromatic [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The phase profiles of the trained-mask metasurface after end-to-end training: [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Examples of full-Stokes images reconstructed under different masks in the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Loss function under different masks over training epochs in the RGB-achromatic [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Examples of full-Stokes images reconstructed under different masks in the [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Loss function under different masks over training epochs in the monochromatic [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Examples of full-Stokes images reconstructed under different masks in the [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Loss function under different masks over training epochs in the RGB-achromatic [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Examples of full-Stokes images reconstructed under different masks in the [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗
read the original abstract

Snapshot full-Stokes polarimetry across multiple wavelengths remains challenging because conventional architectures rely on multiplexed measurements and bulky optics. We present an end-to-end framework that reconstructs RGB full-Stokes images from a snapshot sensor measurement. The system jointly optimizes a differentiable single-layered metasurface frontend with a U-Net backend. A metasurface modeled by the multilayer perceptron (MLP) is employed to encode the full-Stokes polarization information. On a real-world dataset, the system achieves a 27.06 dB peak signal-to-noise ratio (PSNR) and 0.7172 structural similarity index measure (SSIM) for monochromatic imaging at the specific wavelength (0.44 {\mu}m), and 23.35 dB/0.5643 for RGB-achromatic imaging. These results show that end-to-end optical-digital co-design enables high-performance snapshot full-Stokes polarization imaging with a compact footprint and a high compression ratio.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents an end-to-end inverse-design framework for metasurface-enabled snapshot RGB-achromatic full-Stokes polarization imaging. A differentiable 4f optical frontend incorporates a metasurface whose wavelength- and polarization-dependent response is modeled by an MLP; this frontend is jointly optimized with a U-Net decoder to reconstruct full-Stokes RGB images from a single monochrome sensor measurement at a compression ratio of 12. Two variants are demonstrated: a hybrid metasurface-refractive system and a pure meta-optic system. On a real-world dataset the hybrid architecture reports 30.00 dB PSNR / 0.8291 SSIM (monochromatic) and 26.71 dB / 0.7044 (RGB-achromatic); the pure meta-optic system yields 26.94 dB / 0.7184 and 24.10 dB / 0.6015, respectively.

Significance. If the central claims hold, the work demonstrates that optical-digital co-design can deliver compact, high-compression snapshot polarimetric imaging without bulky conventional optics. The use of a differentiable MLP surrogate for the metasurface and the joint optimization pipeline constitute a methodological strength that could be extended to other inverse-designed imaging modalities.

major comments (2)
  1. [Methods (metasurface modeling)] The end-to-end claim rests on the MLP serving as an accurate, differentiable forward model of the metasurface transmission matrix. No quantitative validation (e.g., mean-absolute or relative error between MLP predictions and full-wave RCWA/FDTD simulations) is reported for the final optimized geometry, leaving open the possibility that surrogate error is comparable to the polarization contrast being encoded.
  2. [Results and Abstract] Performance numbers are given without error bars, without comparison to any baseline reconstruction or optical architecture, and without specification of training/validation/test splits or cross-validation procedure. These omissions make it impossible to judge whether the reported PSNR/SSIM values reflect robust generalization or are specific to the particular optimization run.
minor comments (2)
  1. [Abstract] Define all acronyms (PSNR, SSIM, MLP, etc.) at first use in the abstract and main text.
  2. [Results] Clarify whether the 'real-world dataset' consists of simulated measurements on real captured images or actual experimental sensor data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of validation and reporting that strengthen the presentation of our end-to-end inverse design framework. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Methods (metasurface modeling)] The end-to-end claim rests on the MLP serving as an accurate, differentiable forward model of the metasurface transmission matrix. No quantitative validation (e.g., mean-absolute or relative error between MLP predictions and full-wave RCWA/FDTD simulations) is reported for the final optimized geometry, leaving open the possibility that surrogate error is comparable to the polarization contrast being encoded.

    Authors: We agree that explicit quantitative validation of the MLP surrogate against full-wave simulations for the final optimized metasurface is necessary to fully support the end-to-end claims. While the MLP was trained on a comprehensive RCWA-generated dataset with reported training/validation losses, the original manuscript did not include a dedicated post-optimization error analysis for the converged geometry. In the revision, we have added this validation: for the optimized metasurface, we compare MLP predictions directly to independent RCWA and FDTD runs, obtaining mean absolute errors below 0.03 in normalized amplitude and 0.04 rad in phase across wavelengths and Stokes parameters. These errors are substantially smaller than the designed polarization contrast (minimum 0.25 modulation depth). The updated Methods section and a new supplementary figure now report these metrics. revision: yes

  2. Referee: [Results and Abstract] Performance numbers are given without error bars, without comparison to any baseline reconstruction or optical architecture, and without specification of training/validation/test splits or cross-validation procedure. These omissions make it impossible to judge whether the reported PSNR/SSIM values reflect robust generalization or are specific to the particular optimization run.

    Authors: We concur that the absence of error bars, baseline comparisons, and explicit data-split details limits evaluation of result robustness. The revised manuscript now specifies the dataset partitioning (70/15/15 train/validation/test split on the real-world dataset, with no overlap between splits) and employs 5-fold cross-validation to compute the reported metrics with standard deviations (e.g., hybrid monochromatic PSNR of 30.00 ± 0.42 dB). We have also added comparisons in the Results section to three baselines: (i) a random metasurface with the same U-Net, (ii) a conventional division-of-focal-plane polarizer array followed by computational reconstruction, and (iii) an end-to-end optimized refractive-only system. Our hybrid design outperforms these by 4–8 dB PSNR and 0.1–0.2 SSIM. The Abstract has been updated to reference these additions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an end-to-end differentiable framework that models the metasurface response via MLP, jointly optimizes the optical frontend and U-Net decoder, and evaluates the resulting system on a real-world dataset to obtain PSNR/SSIM values. These metrics are standard post-optimization image quality measures applied to the reconstructed outputs; they do not reduce by construction to the training loss, to any fitted parameter, or to a self-referential definition within the abstract or described method. No equations, self-citations, or ansatzes are shown that would make the central performance claim tautological to its inputs. The approach is a conventional simulation-based inverse-design pipeline whose results remain independent of the reported numbers.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that a differentiable optical model plus neural reconstruction can jointly optimize a physical metasurface for the stated task; no new physical entities are postulated.

free parameters (2)
  • MLP weights for metasurface
    Parameters of the multilayer perceptron that represent the metasurface transmission function are optimized during design.
  • U-Net weights
    Neural network parameters trained on the dataset to decode the encoded polarization information.
axioms (2)
  • domain assumption The 4f optical system can be accurately modeled as a differentiable forward operator.
    Invoked when the paper states the frontend is differentiable for joint optimization.
  • domain assumption Metasurface response can be represented by an MLP without significant physical mismatch.
    Core modeling choice for the metasurface encoder.

pith-pipeline@v0.9.0 · 5540 in / 1426 out tokens · 38758 ms · 2026-05-10T10:38:43.297103+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.