VLM-Aware Meta-Optic Front-End Design for Frozen Vision-Language Models

Chanik Kang; Haejun Chung; Rapha\"el Pestourie

arxiv: 2606.27646 · v1 · pith:EFID2BBMnew · submitted 2026-06-26 · 💻 cs.CV · physics.optics

VLM-Aware Meta-Optic Front-End Design for Frozen Vision-Language Models

Chanik Kang , Rapha\"el Pestourie , Haejun Chung This is my paper

Pith reviewed 2026-06-29 00:26 UTC · model grok-4.3

classification 💻 cs.CV physics.optics

keywords meta-opticsco-designvision-language modelsCLIPdifferentiable opticsadjoint gradientsoptical front-endfrozen models

0 comments

The pith

Optimizing meta-optics directly against a frozen CLIP loss raises zero-shot accuracy from 53.75% to 65.41% on ImageNet-100.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that meta-optic front-ends for compact cameras can be designed by directly minimizing the cross-entropy loss of a fixed zero-shot vision-language model instead of maximizing conventional image quality metrics. It does so through a differentiable Maxwell solver that propagates light through a continuous-density optic and updates the optic parameters via adjoint gradients. In two-dimensional simulations this yields higher downstream recognition accuracy, and the same optic works across multiple frozen models and datasets without retraining. A reader would care because it suggests that optics tailored to human viewing are not optimal for machine recognition under physical size and efficiency limits.

Core claim

CODA optimizes a continuous-density meta-optic front-end for frozen-model recognition using differentiable image formation and adjoint-gradient updates of Maxwell-based simulations. It directly optimizes the cross-entropy loss of a fixed zero-shot CLIP classifier without learned reconstruction, image signal processing, or image-fidelity auxiliary objectives, improving CLIP ViT-L/14 zero-shot accuracy from 53.75 ± 3.57% to 65.41 ± 3.99% on ImageNet-100. The resulting optics transfer without re-optimization to SigLIP and DINOv2 on ImageNet-100, CIFAR-100, and Food-101.

What carries the argument

The CODA co-design loop that treats the meta-optic parameters as differentiable variables updated by backpropagating the frozen classifier's cross-entropy loss through a Maxwell-based image formation model.

If this is right

The same optic design works across CLIP, SigLIP, and DINOv2 without further optimization.
Gains appear on CIFAR-100 and Food-101 as well as ImageNet-100.
Recognition under meta-optic constraints improves when optical design is aligned with the model loss rather than image fidelity.
No auxiliary reconstruction or perceptual losses are required for the improvement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Optics optimized for human-interpretable images may be systematically suboptimal for downstream machine perception.
Similar end-to-end optical co-design could be applied to other constrained sensors such as thermal or event cameras.
Task-specific optics might become practical if the simulation-to-hardware gap can be closed.

Load-bearing premise

The Maxwell solver and adjoint gradients produce designs whose performance will match real fabricated hardware.

What would settle it

Fabricate the optimized meta-optic, capture real images of ImageNet objects through it, and measure whether zero-shot CLIP accuracy matches or exceeds the simulated 65.41%.

Figures

Figures reproduced from arXiv: 2606.27646 by Chanik Kang, Haejun Chung, Rapha\"el Pestourie.

**Figure 2.** Figure 2: Representative optics-adaptation formulations for optics–AI co [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Simulation and line-scan image formation. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Sensor-space gradients induced by the frozen visual classifier. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Representative optical designs and fields for the [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Optimization trajectories on ImageNet-100 with frozen CLIP ViT [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Conventional machine-vision pipelines typically rely on high-quality optics that produce clean, human-interpretable images, and optical design has therefore been driven by image-level criteria such as resolution, aberration correction, and pixel fidelity. However, such optics are often impractical for size-, cost-, or form-factor-constrained applications, where compact meta-optics offer an attractive alternative but operate under strict physical efficiency limits. We propose CODA, a co-design framework that optimizes a continuous-density meta-optic front-end for frozen-model recognition using differentiable image formation and adjoint-gradient updates of Maxwell-based simulations. CODA directly optimizes the cross-entropy loss of a fixed zero-shot CLIP classifier without learned reconstruction, image signal processing, or image-fidelity auxiliary objectives. In a two-dimensional simulated imaging benchmark on ImageNet-100, CODA improves CLIP ViT-L/14 zero-shot accuracy from 53.75 $\pm$ 3.57$\%$ with a focal-concentration baseline to 65.41 $\pm$ 3.99$\%$. The optimized optics further transfer without re-optimization across CLIP, SigLIP, and DINOv2 on ImageNet-100, CIFAR-100, and Food-101. These results demonstrate that, under constrained meta-optic imaging, downstream recognition can be improved by aligning optical design with frozen vision-model objectives rather than conventional image-formation criteria.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CODA gets an 11-point simulated accuracy lift on ImageNet-100 by optimizing meta-optics directly on frozen CLIP cross-entropy via 2D Maxwell adjoint gradients, but the entire result stays inside that simulator with no fabricated hardware check.

read the letter

The central result is that they replace conventional image-quality objectives with direct minimization of a frozen zero-shot CLIP classifier's cross-entropy loss. They do this by running adjoint-gradient updates through a differentiable 2D Maxwell image-formation model to shape a continuous-density meta-optic. On ImageNet-100 the optimized optic raises ViT-L/14 accuracy from 53.75 % to 65.41 % relative to a focal-concentration baseline, and the same optic works without re-optimization on SigLIP and DINOv2 across three datasets.

That framing is the actual novelty: task-specific optics for downstream recognition rather than human-interpretable images. The simulation numbers are reported with standard deviations, and the transfer claim is at least internally consistent within the 2D setup.

The obvious limitation is that every number comes from the 2D Maxwell simulator. The abstract gives no fabricated devices, no 3D vectorial validation, no measured fabrication error, and no sensor-noise model. The stress-test note is therefore on target: if the simulator does not capture real 3D behavior, material dispersion, or process variation, the reported gains will not appear in hardware. The paper also does not show how the continuous density is discretized for actual lithography or whether the adjoint updates remain stable under that discretization.

This work is aimed at the computational-imaging and meta-optics community that already uses differentiable simulators. Readers who care about co-design for machine-vision pipelines will find the loss-alignment idea useful even if they treat the numbers as simulation-only. The approach is coherent on its own terms and the citation pattern is not circular, so it clears the bar for a serious referee. I would send it out, but the review will need to focus on whether the simulator fidelity claim can be substantiated or bounded.

Referee Report

2 major / 2 minor

Summary. The paper proposes CODA, a co-design framework that uses differentiable Maxwell-based image formation and adjoint-gradient optimization to directly tune continuous-density meta-optic front-ends for frozen zero-shot VLMs (CLIP, SigLIP, DINOv2) by minimizing cross-entropy loss without reconstruction or image-fidelity auxiliaries. In 2D simulated imaging on ImageNet-100 it reports lifting CLIP ViT-L/14 accuracy from 53.75 ± 3.57 % (focal baseline) to 65.41 ± 3.99 %; the same optics are shown to transfer across models and to CIFAR-100/Food-101 without re-optimization.

Significance. If the 2D simulator faithfully predicts fabricated 3D meta-optic behavior, the result would demonstrate that task-specific optical co-design can outperform conventional image-quality criteria under severe physical constraints, opening a route to compact, model-aware front-ends for edge vision. The absence of learned reconstruction or auxiliary losses is a methodological strength.

major comments (2)

[Abstract and §4] Abstract and §4 (experimental results): all reported accuracy gains and cross-model transfer are obtained inside a 2D Maxwell simulator; the manuscript supplies no quantitative validation of simulator fidelity against fabricated devices, 3D vectorial effects, material dispersion, or sensor noise, which directly undermines the central claim that the designs “transfer without re-optimization” to real constrained meta-optics.
[§3.2] §3.2 (image-formation model): the adjoint-gradient updates rest on the assumption that the 2D continuous-density parameterization accurately captures the physical degrees of freedom of a real 3D meta-optic; no sensitivity analysis or fabrication-error model is provided to bound the expected performance drop.

minor comments (2)

[Table 1] Table 1 (or equivalent results table): report the number of random seeds and exact data splits used for the ±3.57 % / ±3.99 % intervals so that statistical significance of the 11.66-point gain can be assessed.
[§3.1] Notation: the continuous-density variable and its projection onto the binary fabrication constraint should be defined with an equation number in §3.1.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments highlighting the simulation-only nature of our study. We address each major point below and will revise the manuscript to clarify scope and limitations without overstating applicability to physical devices.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (experimental results): all reported accuracy gains and cross-model transfer are obtained inside a 2D Maxwell simulator; the manuscript supplies no quantitative validation of simulator fidelity against fabricated devices, 3D vectorial effects, material dispersion, or sensor noise, which directly undermines the central claim that the designs “transfer without re-optimization” to real constrained meta-optics.

Authors: We agree that the reported gains (53.75% to 65.41% on ImageNet-100) and cross-model transfers are obtained exclusively within the 2D Maxwell simulator, with no fabricated-device validation, 3D effects, dispersion, or noise modeling included. The work is a simulation study; we will revise the abstract and §4 to explicitly qualify all claims as applying to the 2D simulated environment and remove language implying direct transfer to physical meta-optics. This is a scope clarification. revision: yes
Referee: [§3.2] §3.2 (image-formation model): the adjoint-gradient updates rest on the assumption that the 2D continuous-density parameterization accurately captures the physical degrees of freedom of a real 3D meta-optic; no sensitivity analysis or fabrication-error model is provided to bound the expected performance drop.

Authors: The 2D continuous-density model is a standard approximation in meta-optics literature for adjoint optimization. We will add a limitations paragraph in §3.2 noting this assumption, citing 2D-to-3D discrepancy studies, and stating that real-device performance may degrade without providing quantitative bounds. A full fabrication-error model lies outside the current simulation-focused scope. revision: partial

standing simulated objections not resolved

Quantitative validation of simulator fidelity against fabricated 3D meta-optic devices, including 3D vectorial effects, material dispersion, and sensor noise

Circularity Check

0 steps flagged

No circularity: direct simulation-based optimization on fixed model loss

full rationale

The paper's central derivation optimizes meta-optic density parameters via adjoint gradients on a differentiable 2D Maxwell image-formation model to minimize the cross-entropy of a frozen CLIP classifier. Reported accuracy gains (53.75% to 65.41% on ImageNet-100) and cross-model transfer are produced entirely inside this simulation; no parameter is fitted to a data subset and then renamed as a prediction, no quantity is defined in terms of itself, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain therefore remains self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no information on free parameters, axioms, or invented entities; full text required for audit.

pith-pipeline@v0.9.1-grok · 5789 in / 1100 out tokens · 30703 ms · 2026-06-29T00:26:32.499260+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 3 canonical work pages

[1]

In: European conference on computer vision

Bossard, L., Guillaumin, M., Van Gool, L.: Food-101–mining discriminative com- ponents with random forests. In: European conference on computer vision. pp. 446–461. Springer (2014)

2014
[2]

Scientific reports8(1), 12324 (2018)

Chang, J., Sitzmann, V., Dun, X., Heidrich, W., Wetzstein, G.: Hybrid optical- electronic convolutional neural networks with optimized diffractive optics for image classification. Scientific reports8(1), 12324 (2018)

2018
[3]

Nature Communications16(1), 363 (2025)

Chen, J., Huang, S.X., Chan, K.F., Wu, G.B., Chan, C.H.: 3d-printed aberration- freeterahertzmetalensforultra-broadbandachromaticsuper-resolutionwide-angle imaging with high numerical aperture. Nature Communications16(1), 363 (2025)

2025
[4]

Light: Advanced Man- ufacturing7, 1–12 (2026).https://doi.org/10.37188/lam.2026.045

Chi, C., Hou, Q., Zhao, G., Song, Q., Xu, S., Piao, Y., Qin, M., Hu, Y., Chen, C., Cai, W., Chen, Y., Yuan, X., Duan, H.: Ultracompact wide-fov near-infrared camera with a wafer-level manufactured meta-aspheric lens. Light: Advanced Man- ufacturing7, 1–12 (2026).https://doi.org/10.37188/lam.2026.045

work page doi:10.37188/lam.2026.045 2026
[5]

Journal of the Optical Society of America B38(2), 496–509 (2021)

Christiansen, R.E., Sigmund, O.: Inverse design in photonics by topology opti- mization: tutorial. Journal of the Optical Society of America B38(2), 496–509 (2021)

2021
[6]

Optics express28(5), 6945–6965 (2020)

Chung, H., Miller, O.D.: High-na achromatic metalenses by inverse design. Optics express28(5), 6945–6965 (2020)

2020
[7]

Applied optics58(12), 3179–3186 (2019)

Colburn, S., Chu, Y., Shilzerman, E., Majumdar, A.: Optical frontend for a con- volutional neural network. Applied optics58(12), 3179–3186 (2019)

2019
[8]

Science advances4(2), eaar2114 (2018)

Colburn, S., Zhan, A., Majumdar, A.: Metasurface optics for full-color computa- tional imaging. Science advances4(2), eaar2114 (2018)

2018
[9]

In: 2009 IEEE conference on computer vision and pattern recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)

2009
[10]

Acs Pho- tonics6(8), 2161–2167 (2019)

Faraji-Dana, M., Arbabi, E., Kwon, H., Kamali, S.M., Arbabi, A., Bartholomew, J.G., Faraon, A.: Hyperspectral imager with folded metasurface optics. Acs Pho- tonics6(8), 2161–2167 (2019)

2019
[11]

Fu, W., Zhao, D., Li, Z., Liu, S., Tian, C., Huang, K.: Ultracompact meta-imagers forarbitraryall-opticalconvolution.Light:Science&Applications11(1), 62(2022)

2022
[12]

Optics Express30(3), 4467–4491 (2022)

Hammond, A.M., Oskooi, A., Chen, M., Lin, Z., Johnson, S.G., Ralph, S.E.: High- performance hybrid time/frequency-domain topology optimization for large-scale photonics inverse design. Optics Express30(3), 4467–4491 (2022)

2022
[13]

Laser & Photonics Reviews20(5), e00803 (2026) 16 Kang et al

Hao, C., Wu, Y., Yuan, Z., Zhou, Z.W., Wang, Y., Li, M., Feng, C., Wang, K., Zhang, Z., Chen, J.: Compact meta-camera for intelligent wide-angle and low-light imaging. Laser & Photonics Reviews20(5), e00803 (2026) 16 Kang et al

2026
[14]

ACS Photonics5(12), 4781–4787 (2018)

Hughes, T.W., Minkov, M., Williamson, I.A., Fan, S.: Adjoint method and inverse design for nonlinear nanophotonic devices. ACS Photonics5(12), 4781–4787 (2018)

2018
[15]

In: International conference on machine learning

Jia, C., Yang, Y., Xia, Y., Chen, Y.T., Parekh, Z., Pham, H., Le, Q., Sung, Y.H., Li, Z., Duerig, T.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International conference on machine learning. pp. 4904–4916. PMLR (2021)

2021
[16]

arXiv preprint arXiv:2606.16724 (2026)

Kienesberger, L., Kuang, Z., Liu, Y., Miller, O.D.: End-to-end meta-imagers: Information-theoretic objectives and generalized focusing optima. arXiv preprint arXiv:2606.16724 (2026)

arXiv 2026
[17]

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

2009
[18]

Optica6(12), 1461–1470 (2019)

Liang, H., Martins, A., Borges, B.H.V., Zhou, J., Martins, E.R., Li, J., Krauss, T.F.: High performance metalenses: numerical aperture, aberrations, chromaticity, and trade-offs. Optica6(12), 1461–1470 (2019)

2019
[19]

Science 361(6406), 1004–1008 (2018)

Lin, X., Rivenson, Y., Yardimci, N.T., Veli, M., Luo, Y., Jarrahi, M., Ozcan, A.: All-optical machine learning using diffractive deep neural networks. Science 361(6406), 1004–1008 (2018)

2018
[20]

Optics express30(16), 28358–28370 (2022)

Lin, Z., Pestourie, R., Roques-Carmes, C., Li, Z., Capasso, F., Soljačić, M., John- son, S.G.: End-to-end metasurface inverse design for single-shot multi-channel imaging. Optics express30(16), 28358–28370 (2022)

2022
[21]

Nanophotonics10(3), 1177–1187 (2021)

Lin, Z., Roques-Carmes, C., Pestourie, R., Soljačić, M., Majumdar, A., John- son, S.G.: End-to-end nanophotonic inverse design for imaging and polarimetry. Nanophotonics10(3), 1177–1187 (2021)

2021
[22]

Advanced Photonics6(5), 056001– 056001 (2024)

Liu, Y., Li, W.D., Xin, K.Y., Chen, Z.M., Chen, Z.Y., Chen, R., Chen, X.D., Zhao, F.L., Zheng, W.S., Dong, J.W.: Ultra-wide fov meta-camera with transformer- neural-network color imaging methodology. Advanced Photonics6(5), 056001– 056001 (2024)

2024
[23]

Nanophotonics15(7), e70054 (2026)

Ma, W., Pestourie, R., Lin, Z., Johnson, S.G.: Inverse design for robust inference in integrated computational spectrometry. Nanophotonics15(7), e70054 (2026)

2026
[24]

Acs Photonics 7(8), 2073–2079 (2020)

Martins, A., Li, K., Li, J., Liang, H., Conteduca, D., Borges, B.H.V., Krauss, T.F., Martins, E.R.: On metalenses with arbitrarily wide field of view. Acs Photonics 7(8), 2073–2079 (2020)

2073
[25]

Optics Express29(13), 20715– 20723 (2021)

Meem, M., Majumder, A., Banerji, S., Garcia, J.C., Kigner, O.B., Hon, P.W., Sensale-Rodriguez, B., Menon, R.: Imaging from the visible to the longwave in- frared wavelengths via an inverse-designed flat lens. Optics Express29(13), 20715– 20723 (2021)

2021
[26]

Miller, O.: Photonic Design: From Fundamental Solar Cell Physics to Computa- tional Inverse Design. Ph.D. thesis, EECS Department, University of California, Berkeley (May 2012),http://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/ EECS-2012-115.html

2012
[27]

Nature photonics12(11), 659–670 (2018)

Molesky, S., Lin, Z., Piggott, A.Y., Jin, W., Vucković, J., Rodriguez, A.W.: Inverse design in nanophotonics. Nature photonics12(11), 659–670 (2018)

2018
[28]

Applied Optics39(13), 2210–2220 (2000)

Mouroulis, P., Green, R.O., Chrien, T.G.: Design of pushbroom imaging spec- trometers for optimum recovery of spectroscopic and spatial information. Applied Optics39(13), 2210–2220 (2000)

2000
[29]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

Pith/arXiv arXiv 2024
[30]

Computer Physics Communications181(3), 687–702 (2010)

Oskooi, A.F., Roundy, D., Ibanescu, M., Bermel, P., Joannopoulos, J.D., Johnson, S.G.: Meep: A flexible free-software package for electromagnetic simulations by the fdtd method. Computer Physics Communications181(3), 687–702 (2010)

2010
[31]

Nature 654, 917–925 (2026).https://doi.org/10.1038/s41586-026-10635-z

Peng, J., Luo, M., Han, Y., Wu, S., Li, H., Shastri, B.J., Shu, C., Dou, Q., Chai, Y., Huang, C.: Optical metasurfaces for general vision processing on the edge. Nature 654, 917–925 (2026).https://doi.org/10.1038/s41586-026-10635-z

work page doi:10.1038/s41586-026-10635-z 2026
[32]

In: Meila, M., Zhang, T

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceed- ings of Machine Learning Res...

2021
[33]

arXiv preprint arXiv:2511.18980 (2025)

Rodionov, S., Burguete-Lopez, A., Makarenko, M., Wang, Q., Getman, F., Frat- alocchi, A.: Moclip: A foundation model for large-scale nanophotonic inverse de- sign. arXiv preprint arXiv:2511.18980 (2025)

arXiv 2025
[34]

Advanced Pho- tonics6(6), 066002 (2024)

Seo, J., Jo, J., Kim, J., Kang, J., Kang, C., Moon, S.W., Lee, E., Hong, J., Rho, J., Chung, H.: Deep-learning-driven end-to-end metalens imaging. Advanced Pho- tonics6(6), 066002 (2024)

2024
[35]

Nature Communications14(1), 1035 (2023)

Shen, Z., Zhao, F., Jin, C., Wang, S., Cao, L., Yang, Y.: Monocular metasurface camera for passive single-shot 4d imaging. Nature Communications14(1), 1035 (2023)

2023
[36]

ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

Sitzmann, V., Diamond, S., Peng, Y., Dun, X., Boyd, S., Heidrich, W., Heide, F., Wetzstein, G.: End-to-end optimization of optics and image processing for achro- matic extended depth of field and super-resolution imaging. ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

2018
[37]

arXiv preprint arXiv:1910.10699 (2019)

Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)

arXiv 1910
[38]

Nature communications 12(1), 6493 (2021)

Tseng,E.,Colburn,S.,Whitehead,J.,Huang,L.,Baek,S.H.,Majumdar,A.,Heide, F.: Neural nano-optics for high-quality thin lens imaging. Nature communications 12(1), 6493 (2021)

2021
[39]

ACM Trans- actions on Graphics (TOG)40(2), 1–19 (2021)

Tseng, E., Mosleh, A., Mannan, F., St-Arnaud, K., Sharma, A., Peng, Y., Braun, A., Nowrouzezahrai, D., Lalonde, J.F., Heide, F.: Differentiable compound optics and processing pipeline optimization for end-to-end camera design. ACM Trans- actions on Graphics (TOG)40(2), 1–19 (2021)

2021
[40]

npj Nanophotonics1(1), 4 (Apr 2024).https://doi.org/10

Wang, J., Yu, R., Ye, X., Sun, J., Li, J., Huang, C., Xiao, X., Ji, J., Shen, W., Tie, Z., Chen, C., Zhu, S., Li, T.: Quantitative phase imaging with a compact meta-microscope. npj Nanophotonics1(1), 4 (Apr 2024).https://doi.org/10. 1038/s44310-024-00007-8,https://doi.org/10.1038/s44310-024-00007-8

work page doi:10.1038/s44310-024-00007-8 2024
[41]

IEEE transactions on image processing 13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)

2004
[42]

Light: Science & Applications14(1), 17 (2025)

Wirth-Singh, A., Fröch, J.E., Yang, F., Martin, L., Zheng, H., Zhang, H., Tanguy, Q.T., Zhou, Z., Huang, L., John, D.D., et al.: Wide field of view large aperture meta-doublet eyepiece. Light: Science & Applications14(1), 17 (2025)

2025
[43]

Advanced Photonics Nexus4(2), 026009–026009 (2025)

Wirth-Singh, A., Xiang, J., Choi, M., Fröch, J.E., Huang, L., Colburn, S., Shlizer- man, E., Majumdar, A.: Compressed meta-optical encoder for image classification. Advanced Photonics Nexus4(2), 026009–026009 (2025)

2025
[44]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zhai, X., Mustafa, B., Kolesnikov, A., Beyer, L.: Sigmoid loss for language im- age pre-training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11975–11986 (2023) 18 Kang et al

2023
[45]

Laser & Photonics Reviews18(8), 2400187 (2024)

Zhang, Q., Lin, P., Wang, C., Zhang, Y., Yu, Z., Liu, X., Lu, Y., Xu, T., Zheng, Z.: Neural-optic co-designed polarization-multiplexed metalens for compact com- putational spectral imaging. Laser & Photonics Reviews18(8), 2400187 (2024)

2024
[46]

IEEE Transactions on Image Processing20(12), 3322–3340 (2011)

Zhou, C., Nayar, S.K.: Computational cameras: convergence of optics and process- ing. IEEE Transactions on Image Processing20(12), 3322–3340 (2011)

2011

[1] [1]

In: European conference on computer vision

Bossard, L., Guillaumin, M., Van Gool, L.: Food-101–mining discriminative com- ponents with random forests. In: European conference on computer vision. pp. 446–461. Springer (2014)

2014

[2] [2]

Scientific reports8(1), 12324 (2018)

Chang, J., Sitzmann, V., Dun, X., Heidrich, W., Wetzstein, G.: Hybrid optical- electronic convolutional neural networks with optimized diffractive optics for image classification. Scientific reports8(1), 12324 (2018)

2018

[3] [3]

Nature Communications16(1), 363 (2025)

Chen, J., Huang, S.X., Chan, K.F., Wu, G.B., Chan, C.H.: 3d-printed aberration- freeterahertzmetalensforultra-broadbandachromaticsuper-resolutionwide-angle imaging with high numerical aperture. Nature Communications16(1), 363 (2025)

2025

[4] [4]

Light: Advanced Man- ufacturing7, 1–12 (2026).https://doi.org/10.37188/lam.2026.045

Chi, C., Hou, Q., Zhao, G., Song, Q., Xu, S., Piao, Y., Qin, M., Hu, Y., Chen, C., Cai, W., Chen, Y., Yuan, X., Duan, H.: Ultracompact wide-fov near-infrared camera with a wafer-level manufactured meta-aspheric lens. Light: Advanced Man- ufacturing7, 1–12 (2026).https://doi.org/10.37188/lam.2026.045

work page doi:10.37188/lam.2026.045 2026

[5] [5]

Journal of the Optical Society of America B38(2), 496–509 (2021)

Christiansen, R.E., Sigmund, O.: Inverse design in photonics by topology opti- mization: tutorial. Journal of the Optical Society of America B38(2), 496–509 (2021)

2021

[6] [6]

Optics express28(5), 6945–6965 (2020)

Chung, H., Miller, O.D.: High-na achromatic metalenses by inverse design. Optics express28(5), 6945–6965 (2020)

2020

[7] [7]

Applied optics58(12), 3179–3186 (2019)

Colburn, S., Chu, Y., Shilzerman, E., Majumdar, A.: Optical frontend for a con- volutional neural network. Applied optics58(12), 3179–3186 (2019)

2019

[8] [8]

Science advances4(2), eaar2114 (2018)

Colburn, S., Zhan, A., Majumdar, A.: Metasurface optics for full-color computa- tional imaging. Science advances4(2), eaar2114 (2018)

2018

[9] [9]

In: 2009 IEEE conference on computer vision and pattern recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)

2009

[10] [10]

Acs Pho- tonics6(8), 2161–2167 (2019)

Faraji-Dana, M., Arbabi, E., Kwon, H., Kamali, S.M., Arbabi, A., Bartholomew, J.G., Faraon, A.: Hyperspectral imager with folded metasurface optics. Acs Pho- tonics6(8), 2161–2167 (2019)

2019

[11] [11]

Fu, W., Zhao, D., Li, Z., Liu, S., Tian, C., Huang, K.: Ultracompact meta-imagers forarbitraryall-opticalconvolution.Light:Science&Applications11(1), 62(2022)

2022

[12] [12]

Optics Express30(3), 4467–4491 (2022)

Hammond, A.M., Oskooi, A., Chen, M., Lin, Z., Johnson, S.G., Ralph, S.E.: High- performance hybrid time/frequency-domain topology optimization for large-scale photonics inverse design. Optics Express30(3), 4467–4491 (2022)

2022

[13] [13]

Laser & Photonics Reviews20(5), e00803 (2026) 16 Kang et al

Hao, C., Wu, Y., Yuan, Z., Zhou, Z.W., Wang, Y., Li, M., Feng, C., Wang, K., Zhang, Z., Chen, J.: Compact meta-camera for intelligent wide-angle and low-light imaging. Laser & Photonics Reviews20(5), e00803 (2026) 16 Kang et al

2026

[14] [14]

ACS Photonics5(12), 4781–4787 (2018)

Hughes, T.W., Minkov, M., Williamson, I.A., Fan, S.: Adjoint method and inverse design for nonlinear nanophotonic devices. ACS Photonics5(12), 4781–4787 (2018)

2018

[15] [15]

In: International conference on machine learning

Jia, C., Yang, Y., Xia, Y., Chen, Y.T., Parekh, Z., Pham, H., Le, Q., Sung, Y.H., Li, Z., Duerig, T.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International conference on machine learning. pp. 4904–4916. PMLR (2021)

2021

[16] [16]

arXiv preprint arXiv:2606.16724 (2026)

Kienesberger, L., Kuang, Z., Liu, Y., Miller, O.D.: End-to-end meta-imagers: Information-theoretic objectives and generalized focusing optima. arXiv preprint arXiv:2606.16724 (2026)

arXiv 2026

[17] [17]

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

2009

[18] [18]

Optica6(12), 1461–1470 (2019)

Liang, H., Martins, A., Borges, B.H.V., Zhou, J., Martins, E.R., Li, J., Krauss, T.F.: High performance metalenses: numerical aperture, aberrations, chromaticity, and trade-offs. Optica6(12), 1461–1470 (2019)

2019

[19] [19]

Science 361(6406), 1004–1008 (2018)

Lin, X., Rivenson, Y., Yardimci, N.T., Veli, M., Luo, Y., Jarrahi, M., Ozcan, A.: All-optical machine learning using diffractive deep neural networks. Science 361(6406), 1004–1008 (2018)

2018

[20] [20]

Optics express30(16), 28358–28370 (2022)

Lin, Z., Pestourie, R., Roques-Carmes, C., Li, Z., Capasso, F., Soljačić, M., John- son, S.G.: End-to-end metasurface inverse design for single-shot multi-channel imaging. Optics express30(16), 28358–28370 (2022)

2022

[21] [21]

Nanophotonics10(3), 1177–1187 (2021)

Lin, Z., Roques-Carmes, C., Pestourie, R., Soljačić, M., Majumdar, A., John- son, S.G.: End-to-end nanophotonic inverse design for imaging and polarimetry. Nanophotonics10(3), 1177–1187 (2021)

2021

[22] [22]

Advanced Photonics6(5), 056001– 056001 (2024)

Liu, Y., Li, W.D., Xin, K.Y., Chen, Z.M., Chen, Z.Y., Chen, R., Chen, X.D., Zhao, F.L., Zheng, W.S., Dong, J.W.: Ultra-wide fov meta-camera with transformer- neural-network color imaging methodology. Advanced Photonics6(5), 056001– 056001 (2024)

2024

[23] [23]

Nanophotonics15(7), e70054 (2026)

Ma, W., Pestourie, R., Lin, Z., Johnson, S.G.: Inverse design for robust inference in integrated computational spectrometry. Nanophotonics15(7), e70054 (2026)

2026

[24] [24]

Acs Photonics 7(8), 2073–2079 (2020)

Martins, A., Li, K., Li, J., Liang, H., Conteduca, D., Borges, B.H.V., Krauss, T.F., Martins, E.R.: On metalenses with arbitrarily wide field of view. Acs Photonics 7(8), 2073–2079 (2020)

2073

[25] [25]

Optics Express29(13), 20715– 20723 (2021)

Meem, M., Majumder, A., Banerji, S., Garcia, J.C., Kigner, O.B., Hon, P.W., Sensale-Rodriguez, B., Menon, R.: Imaging from the visible to the longwave in- frared wavelengths via an inverse-designed flat lens. Optics Express29(13), 20715– 20723 (2021)

2021

[26] [26]

Miller, O.: Photonic Design: From Fundamental Solar Cell Physics to Computa- tional Inverse Design. Ph.D. thesis, EECS Department, University of California, Berkeley (May 2012),http://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/ EECS-2012-115.html

2012

[27] [27]

Nature photonics12(11), 659–670 (2018)

Molesky, S., Lin, Z., Piggott, A.Y., Jin, W., Vucković, J., Rodriguez, A.W.: Inverse design in nanophotonics. Nature photonics12(11), 659–670 (2018)

2018

[28] [28]

Applied Optics39(13), 2210–2220 (2000)

Mouroulis, P., Green, R.O., Chrien, T.G.: Design of pushbroom imaging spec- trometers for optimum recovery of spectroscopic and spatial information. Applied Optics39(13), 2210–2220 (2000)

2000

[29] [29]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

Pith/arXiv arXiv 2024

[30] [30]

Computer Physics Communications181(3), 687–702 (2010)

Oskooi, A.F., Roundy, D., Ibanescu, M., Bermel, P., Joannopoulos, J.D., Johnson, S.G.: Meep: A flexible free-software package for electromagnetic simulations by the fdtd method. Computer Physics Communications181(3), 687–702 (2010)

2010

[31] [31]

Nature 654, 917–925 (2026).https://doi.org/10.1038/s41586-026-10635-z

Peng, J., Luo, M., Han, Y., Wu, S., Li, H., Shastri, B.J., Shu, C., Dou, Q., Chai, Y., Huang, C.: Optical metasurfaces for general vision processing on the edge. Nature 654, 917–925 (2026).https://doi.org/10.1038/s41586-026-10635-z

work page doi:10.1038/s41586-026-10635-z 2026

[32] [32]

In: Meila, M., Zhang, T

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceed- ings of Machine Learning Res...

2021

[33] [33]

arXiv preprint arXiv:2511.18980 (2025)

Rodionov, S., Burguete-Lopez, A., Makarenko, M., Wang, Q., Getman, F., Frat- alocchi, A.: Moclip: A foundation model for large-scale nanophotonic inverse de- sign. arXiv preprint arXiv:2511.18980 (2025)

arXiv 2025

[34] [34]

Advanced Pho- tonics6(6), 066002 (2024)

Seo, J., Jo, J., Kim, J., Kang, J., Kang, C., Moon, S.W., Lee, E., Hong, J., Rho, J., Chung, H.: Deep-learning-driven end-to-end metalens imaging. Advanced Pho- tonics6(6), 066002 (2024)

2024

[35] [35]

Nature Communications14(1), 1035 (2023)

Shen, Z., Zhao, F., Jin, C., Wang, S., Cao, L., Yang, Y.: Monocular metasurface camera for passive single-shot 4d imaging. Nature Communications14(1), 1035 (2023)

2023

[36] [36]

ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

Sitzmann, V., Diamond, S., Peng, Y., Dun, X., Boyd, S., Heidrich, W., Heide, F., Wetzstein, G.: End-to-end optimization of optics and image processing for achro- matic extended depth of field and super-resolution imaging. ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

2018

[37] [37]

arXiv preprint arXiv:1910.10699 (2019)

Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)

arXiv 1910

[38] [38]

Nature communications 12(1), 6493 (2021)

Tseng,E.,Colburn,S.,Whitehead,J.,Huang,L.,Baek,S.H.,Majumdar,A.,Heide, F.: Neural nano-optics for high-quality thin lens imaging. Nature communications 12(1), 6493 (2021)

2021

[39] [39]

ACM Trans- actions on Graphics (TOG)40(2), 1–19 (2021)

Tseng, E., Mosleh, A., Mannan, F., St-Arnaud, K., Sharma, A., Peng, Y., Braun, A., Nowrouzezahrai, D., Lalonde, J.F., Heide, F.: Differentiable compound optics and processing pipeline optimization for end-to-end camera design. ACM Trans- actions on Graphics (TOG)40(2), 1–19 (2021)

2021

[40] [40]

npj Nanophotonics1(1), 4 (Apr 2024).https://doi.org/10

Wang, J., Yu, R., Ye, X., Sun, J., Li, J., Huang, C., Xiao, X., Ji, J., Shen, W., Tie, Z., Chen, C., Zhu, S., Li, T.: Quantitative phase imaging with a compact meta-microscope. npj Nanophotonics1(1), 4 (Apr 2024).https://doi.org/10. 1038/s44310-024-00007-8,https://doi.org/10.1038/s44310-024-00007-8

work page doi:10.1038/s44310-024-00007-8 2024

[41] [41]

IEEE transactions on image processing 13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)

2004

[42] [42]

Light: Science & Applications14(1), 17 (2025)

Wirth-Singh, A., Fröch, J.E., Yang, F., Martin, L., Zheng, H., Zhang, H., Tanguy, Q.T., Zhou, Z., Huang, L., John, D.D., et al.: Wide field of view large aperture meta-doublet eyepiece. Light: Science & Applications14(1), 17 (2025)

2025

[43] [43]

Advanced Photonics Nexus4(2), 026009–026009 (2025)

Wirth-Singh, A., Xiang, J., Choi, M., Fröch, J.E., Huang, L., Colburn, S., Shlizer- man, E., Majumdar, A.: Compressed meta-optical encoder for image classification. Advanced Photonics Nexus4(2), 026009–026009 (2025)

2025

[44] [44]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zhai, X., Mustafa, B., Kolesnikov, A., Beyer, L.: Sigmoid loss for language im- age pre-training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11975–11986 (2023) 18 Kang et al

2023

[45] [45]

Laser & Photonics Reviews18(8), 2400187 (2024)

Zhang, Q., Lin, P., Wang, C., Zhang, Y., Yu, Z., Liu, X., Lu, Y., Xu, T., Zheng, Z.: Neural-optic co-designed polarization-multiplexed metalens for compact com- putational spectral imaging. Laser & Photonics Reviews18(8), 2400187 (2024)

2024

[46] [46]

IEEE Transactions on Image Processing20(12), 3322–3340 (2011)

Zhou, C., Nayar, S.K.: Computational cameras: convergence of optics and process- ing. IEEE Transactions on Image Processing20(12), 3322–3340 (2011)

2011