pith. sign in

arxiv: 2605.29043 · v1 · pith:7LHPUMEVnew · submitted 2026-05-27 · ⚛️ physics.optics

Multimodal Optical Feature Extraction with a Free-Space Photonic Extreme Learning Machine

Pith reviewed 2026-06-29 10:09 UTC · model grok-4.3

classification ⚛️ physics.optics
keywords photonic extreme learning machinefree-space opticsmultimodal feature extractionoptical computingMNIST classificationspoken digit recognitionextreme learning machine
0
0 comments X

The pith

A single free-space optical apparatus extracts features for images, audio spectrograms, tabular data, and regression without hardware changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that phase-only SLM encoding, free-space Fourier-like propagation, and intensity detection can produce a high-dimensional feature map usable across structurally different inputs in one fixed setup. It reports 96.56 percent accuracy on MNIST, 95.67 percent on log-Mel spectrograms for spoken digits, 100 percent on mushroom classification, and 0.0699 NRMSE on abalone regression. A sympathetic reader would care because only the final linear readout is trained digitally while the optical stage remains unchanged, suggesting optical systems could serve as general-purpose feature generators rather than task-specific processors. Diagnostics indicate the map preserves geometry for some tasks and accumulates class means for others.

Core claim

A fixed free-space photonic extreme learning machine using phase-only SLM encoding followed by Fourier-like propagation and camera detection generates a task-agnostic feature map that supports high-accuracy performance on image classification, spectrogram-based audio classification, binary tabular classification, and regression within the same physical pipeline.

What carries the argument

The fixed optical transformation of phase-only SLM encoding, free-space Fourier-like propagation, and intensity detection that maps inputs to a high-dimensional feature space for downstream linear readout.

If this is right

  • The same apparatus handles image, audio-derived, tabular, and regression tasks without optical reconfiguration.
  • Empirical diagnostics separate geometry-preserving behavior for images and regression from class-mean accumulation for spectrograms.
  • Multimodal PELMs become a practical route to general-purpose optical feature extraction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the optical map remains effective across more modalities, it could reduce the need for separate digital feature extractors in mixed-sensor systems.
  • Encoding additional data types such as time-series directly onto the SLM might test whether the current propagation distance suffices or whether multiple planes become necessary.
  • The reported kernel alignment values suggest the optical stage approximates certain kernel functions; measuring explicit kernel matrices on held-out data would quantify how closely it matches standard random feature maps.

Load-bearing premise

A single fixed optical transformation produces a sufficiently rich feature map for structurally different inputs without any optical reconfiguration or extra modality-specific preprocessing.

What would settle it

Running a new data type such as raw video frames or text token sequences through the identical fixed optical apparatus and checking whether linear readout accuracy stays comparable to the reported benchmarks without any change to the SLM or propagation path.

Figures

Figures reproduced from arXiv: 2605.29043 by Abhinav Choube, Anshuman Kumar, Anushka Kumari, Anushree Khisti, Devansh Satra, Srivatsa Murali.

Figure 1
Figure 1. Figure 1: Experimental free-space PELM. A 532 nm laser is polarisation -controlled and phase modulated by a spatial light modulator. The optical path implements Fourier like free-space mixing, an iris selects the informative first diffracted order and suppresses the zero order background, and a camera records intensity features for the ridge regression readout. 2.3 Experimental implementation The experimental setup … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the multimodal PELM pipeline used in this work. Image data, audio derived spectrograms, and tabular inputs are first converted into phase patterns for SLM display and then combined with fixed embedding mask. The encoded optical field undergoes free space propagation, after which the resulting intensity patterns are captured by the camera and converted into feature vectors. Only the final readou… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the two optical encoding approaches used in the experiments: (a) noise embedding and (b) Fourier embedding. Each row illustrates a representative stage of the pipeline, including the original input pattern, the corresponding phase mask displayed on the SLM, and the resulting camera recorded intensity distribution after optical propagation. 3.3 Datasets Four datasets representing different dat… view at source ↗
Figure 4
Figure 4. Figure 4: MNIST confusion matrices obtained after optimizing the ridge regularization parameter λ. Both embeddings produce a strongly diagonal classification structure, showing reliable digit recognition across classes. The Fourier embedding achieves slightly higher overall accuracy and shows fewer off diagonal misclassifications compared to the noise embedding, suggesting improved class separation in the optical fe… view at source ↗
Figure 5
Figure 5. Figure 5: FSDD spoken digit confusion matrices after optimizing the ridge parameter. The PELM performs ten class audio derived classification using log-Mel spectrograms encoded as optical phase patterns. The Fourier embedding improves the accuracy to 95.67%. The Mushroom dataset provides a binary tabular classification benchmark. Since the task is nearly linearly separable after suitable encoding, both embeddings pe… view at source ↗
Figure 6
Figure 6. Figure 6: Mushroom binary classification confusion matrices after optimizing the ridge parameter λ. Both embeddings achieve 100.00% classification accuracy, indicating that the optical feature map preserves the separability of the tabular features extremely well. Across all three classifications, the Fourier embedding provides performance comparable to or slightly better than the noise embedding after optimizing λ. … view at source ↗
Figure 7
Figure 7. Figure 7: Experimental kernel characterization for the MNIST dataset using the Fourier embedding. The empirical PELM kernel is binned by input angle θ and compared to exact double centered theoretical predictions. The high Pearson correlation confirms that the optical hardware, aided by the structured spatial carrier of the Fourier mask, physically computes an angular RBF and Kphase kernel. 5.2 Relative distance str… view at source ↗
Figure 8
Figure 8. Figure 8: Empirical relative distance structure preservation for the Fourier embedding. Pairwise distances in the original representation are compared with distances in the experimentally measured optical feature space. MNIST shows moderate positive distance preservation, Abalone shows strong monotonic preservation, while FSDD shows weak global distance preservation despite high classification accuracy. The integrat… view at source ↗
Figure 9
Figure 9. Figure 9: t-SNE graphic representations of optical characteristics that were empirically recorded for the Fourier embedding. MNIST and Mushroom demonstrate clearer class organization, whereas FSDD appears more globally blended in two dimensions, even with a strong ten class accuracy. Abalone exhibits continuous structures that depend on the target, rather than separate class clusters, aligning with its regression or… view at source ↗
Figure 10
Figure 10. Figure 10: Ridge parameter optimization for MNIST classification. The Fourier embedding gives slightly improved performance, reaching 96.56% accuracy at λ = 7.278 × 10−4 . B.2 FSDD spoken digit classification (a) Noise embedding. (b) Fourier embedding [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Ridge parameter optimization for FSDD spoken digit classification. The Fourier embedding reaches 95.67% accuracy at λ = 1.887 × 10−3 . B.3 Mushroom binary classification (a) Noise embedding. (b) Fourier embedding [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Ridge parameter optimization for Mushroom binary classification. Both embeddings operate at saturation and achieve 100.00% classification accuracy. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Ridge parameter optimization for Abalone regression. The metric is NRMSE, so lower values correspond to better performance. The optimized values are reported in [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Relative distance preservation diagnostics for Noise Embedding. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Experimental kernel characterizations for the MNIST dataset. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Experimental kernel characterizations for the FSDD dataset. Note the flat angular distribution due to the high dimensionality of the audio spectrograms causing distance concentration. (a) Mushroom (Noise) (b) Mushroom (Fourier) [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Experimental kernel characterizations for the Mushroom dataset. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Experimental kernel characterizations for the Abalone dataset. C.3 Per class separation diagnostics (a) MNIST (b) FSDD (c) Mushroom [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Per class or per target separation diagnostics for the Fourier embedding. MNIST and Mushroom show stronger visible class organization, while FSDD has weaker global separation despite high classification accuracy. Abalone is excluded from these bar plots as categorical class separation metrics (Cohen’s d) are inapplicable to continuous regression variables. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Per class or per target separation diagnostics for the Noise embedding. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: t-SNE visualizations for MNIST and FSDD optical features under both embeddings. The plots are qualitative projections of the measured high dimensional camera features and should be interpreted alongside the quantitative accuracy, distance, and separation diagnostics. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: t-SNE visualizations for Mushroom and Abalone optical features under both embeddings. Mushroom forms a more directly separable binary structure, while Abalone forms continuous structures associated with regression targets. C.5 Empirical kernel matrices The value labeled “Sep” in the heatmap titles of figs. 23 26 is the ratio of mean between class to within class kernel similarity derived from the empirica… view at source ↗
Figure 23
Figure 23. Figure 23: Empirical optical kernel matrices for MNIST using (a) Noise Embedding and (b) Fourier embedding. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Empirical optical kernel matrices for FSDD using (a) Noise Embedding and (b) Fourier embedding. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Empirical optical kernel matrices for Mushroom using (a) Noise Embedding and (b) Fourier Embedding 28 [PITH_FULL_IMAGE:figures/full_fig_p028_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Empirical optical kernel matrices for Abalone using (a) Noise Embedding and (b) Fourier Embedding C.6 Summary of CKA diagnostics [PITH_FULL_IMAGE:figures/full_fig_p029_26.png] view at source ↗
read the original abstract

Photonic extreme learning machines (PELMs) replace a digitally trained hidden layer by a fixed optical transformation, allowing a high dimensional feature map to be generated by physical propagation while only the final readout is learned. Existing free-space PELM demonstrations have established this principle for image and tabular benchmarks, but a unified multimodal optical feature extractor spanning structurally different data types has remained largely undeveloped. Here we demonstrate a single free-space PELM platform for image, audio derived, binary tabular, and regression tasks using phase only SLM encoding, Fourier like free space propagation, and camera intensity detection. The same optical apparatus achieves 96.56% accuracy on MNIST, 95.67% on spoken digit audio from log-Mel spectrograms, 100.00% on Mushroom classification, and 0.0699 NRMSE on Abalone regression. To our knowledge, this is the first free space PELM spanning image, audio derived, and tabular tasks in one physical pipeline, and the first PELM implementation of spectrogram based spoken digit classification. Empirical distance preservation and kernel alignment diagnostics reveal two operating regimes: geometry preserving for image and regression tasks, and distributed class mean accumulation for audio derived spectrograms. These results establish multimodal PELMs as a practical route toward general purpose optical machine learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper demonstrates a single free-space photonic extreme learning machine (PELM) using phase-only SLM encoding, Fourier-like free-space propagation, and intensity detection that performs image classification on MNIST (96.56% accuracy), spoken-digit classification from log-Mel spectrograms (95.67% accuracy), binary tabular classification on the Mushroom dataset (100% accuracy), and regression on the Abalone dataset (0.0699 NRMSE) without optical reconfiguration between tasks. It reports two operating regimes (geometry-preserving for images/regression and class-mean accumulation for spectrograms) via distance-preservation and kernel-alignment diagnostics and claims this is the first such multimodal free-space PELM.

Significance. If the experimental claims hold, the work provides the first experimental demonstration of a free-space PELM spanning image, audio-derived spectrogram, and tabular modalities in one fixed physical pipeline, together with falsifiable diagnostics that distinguish geometry-preserving versus class-mean regimes. This strengthens the case for practical optical feature extraction beyond single-modality benchmarks.

major comments (2)
  1. [abstract and operating-regimes discussion] The central claim that a single fixed optical transformation generates a task-agnostic feature map is load-bearing for the multimodal result, yet the manuscript does not specify the exact 2D encoding procedure that maps the 22 categorical features of Mushroom and the 8 continuous features of Abalone onto the SLM phase pattern (normalization, possible one-hot or binning steps, reshaping/padding). Without this detail it is impossible to verify that the optical stage is equivalent to the direct pixel-to-phase mapping used for MNIST and log-Mel spectrograms (abstract and § on operating regimes).
  2. [abstract] Performance numbers (96.56%, 95.67%, 100%, 0.0699 NRMSE) are reported without any information on the number of independent experimental runs, error bars, training/test splits, or statistical significance tests. This absence directly undermines the reliability of the cross-modality comparison (abstract).
minor comments (1)
  1. [diagnostics section] Notation for the two operating regimes should be introduced with explicit equations or a table rather than only descriptive labels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We respond to each major comment below and will revise the manuscript to address the identified gaps in detail and statistical reporting.

read point-by-point responses
  1. Referee: [abstract and operating-regimes discussion] The central claim that a single fixed optical transformation generates a task-agnostic feature map is load-bearing for the multimodal result, yet the manuscript does not specify the exact 2D encoding procedure that maps the 22 categorical features of Mushroom and the 8 continuous features of Abalone onto the SLM phase pattern (normalization, possible one-hot or binning steps, reshaping/padding). Without this detail it is impossible to verify that the optical stage is equivalent to the direct pixel-to-phase mapping used for MNIST and log-Mel spectrograms (abstract and § on operating regimes).

    Authors: We agree that the encoding procedure for the tabular datasets requires explicit description to support the multimodal claim. In the revised manuscript we will insert a dedicated methods subsection that specifies: (i) one-hot encoding and min-max normalization of the 22 Mushroom features, followed by zero-padding and reshaping to a 64×64 phase grid; (ii) direct normalization and binning of the 8 Abalone features into the same grid format. These steps will be shown to be mathematically equivalent to the pixel-to-phase mapping used for MNIST and log-Mel spectrograms, thereby confirming that the same fixed optical pipeline is applied across modalities. revision: yes

  2. Referee: [abstract] Performance numbers (96.56%, 95.67%, 100%, 0.0699 NRMSE) are reported without any information on the number of independent experimental runs, error bars, training/test splits, or statistical significance tests. This absence directly undermines the reliability of the cross-modality comparison (abstract).

    Authors: We acknowledge that the absence of run statistics and error bars limits the strength of the reported figures. In the revised version we will augment the abstract and results section with: the number of independent experimental repetitions performed for each task, the corresponding standard deviations or error bars, the exact train/test split ratios employed, and any statistical significance tests (e.g., McNemar or paired t-tests) used to compare modalities. Where only single-run data exist due to experimental constraints, we will state this explicitly and discuss its implications for the cross-modality comparison. revision: yes

Circularity Check

0 steps flagged

No circularity; experimental results from physical measurements

full rationale

The paper reports measured classification accuracies and regression error from a physical free-space optical setup (phase-only SLM encoding, propagation, intensity detection) followed by digital linear readout training. No derivation chain, equations, or first-principles predictions are presented that reduce by construction to fitted inputs, self-citations, or ansatzes; the central claims rest on external benchmark performance obtained via hardware experiment rather than any self-referential mathematical step. The two operating regimes noted are diagnosed empirically from the same physical data, not imposed by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The demonstration relies on standard Fourier optics for propagation and the ELM principle of fixed random features; no new entities or ad-hoc parameters are introduced beyond the standard readout training.

axioms (1)
  • standard math Fourier optics governs the free-space propagation between SLM and camera
    The paper describes the transformation as 'Fourier like free space propagation'.

pith-pipeline@v0.9.1-grok · 5784 in / 1339 out tokens · 37117 ms · 2026-06-29T10:09:47.989925+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 21 canonical work pages

  1. [1]

    Bernstein, and Philip Bertani

    Michael Reck, Anton Zeilinger, Herbert J. Bernstein, and Philip Bertani. Experimental realization of any discrete unitary operator. Physical Review Letters , 73(1):58–61, 1994. doi: 10.1103/ PhysRevLett.73.58

  2. [2]

    Clements, Peter C

    William R. Clements, Peter C. Humphreys, Benjamin J. Metcalf, W. Steven Kolthammer, and Ian A. Walmsley. Optimal design for universal multiport interferometers.Optica, 3(12):1460–1465,

  3. [3]

    doi: 10.1364/OPTICA.3.001460

  4. [4]

    Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić

    Yichen Shen, Nicholas C. Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić. Deep learning with coherent nanophotonic circuits.Nature Photonics, 11:441–446, 2017. doi: 10.1038/ nphoton.2017.93

  5. [5]

    Quantum transport simulations in a pro- grammable nanophotonic processor

    Nicholas C. Harris, Gregory R. Steinbrecher, Mihika Prabhu, Yoav Lahini, Jacob Mower, Darius Bunandar, Changchen Chen, Franco N. C. Wong, Tom Baehr-Jones, Michael Hochberg, Seth Lloyd, and Dirk Englund. Quantum transport simulations in a programmable nanophotonic processor. Nature Photonics, 11:447–452, 2017. doi: 10.1038/nphoton.2017.95

  6. [6]

    Wim Bogaerts, Daniel Pérez, José Capmany, David A. B. Miller, Joyce Poon, Dirk Englund, Francesco Morichetti, and Andrea Melloni. Programmable photonic circuits.Nature, 586(7828): 207–216, 2020. doi: 10.1038/s41586-020-2764-0

  7. [7]

    Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan

    Xing Lin, Yair Rivenson, Nezih T. Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan. All-optical machine learning using diffractive deep neural networks.Science, 361(6406):1004–1008, 2018. doi: 10.1126/science.aat8084

  8. [8]

    Analysis of diffractive optical neural networks and their integration with electronic neural networks.IEEE Journal of Selected Topics in Quantum Electronics, 26(1):8732486, 2020

    Deniz Mengu, Yi Luo, Yair Rivenson, and Aydogan Ozcan. Analysis of diffractive optical neural networks and their integration with electronic neural networks.IEEE Journal of Selected Topics in Quantum Electronics, 26(1):8732486, 2020. doi: 10.1109/JSTQE.2019.2921376

  9. [9]

    All-optical graph representation learning using integrated diffractive photonic computing units.Science Advances, 8(24):eabn7630, 2022

    Tao Yan, Rui Yang, Ziyang Zheng, Xing Lin, Hongkai Xiong, and Qionghai Dai. All-optical graph representation learning using integrated diffractive photonic computing units.Science Advances, 8(24):eabn7630, 2022. doi: 10.1126/sciadv.abn7630

  10. [10]

    Sophisticated deep learning with on-chip optical diffractive tensor processing

    Yuyao Huang, Tingzhao Fu, Honghao Huang, Sigang Yang, and Hongwei Chen. Sophisticated deep learning with on-chip optical diffractive tensor processing. Photonics Research, 11(6): 1125–1138, 2023. doi: 10.1364/PRJ.484662

  11. [11]

    David Wright, Harish Bhaskaran, and Wolfram H

    Carlos Rios, Matthias Stegmaier, Peiman Hosseini, Dandan Wang, Thomas Scherer, C. David Wright, Harish Bhaskaran, and Wolfram H. P. Pernice. Integrated all-photonic non-volatile multi-level memory. Nature Photonics, 9:725–732, 2015. doi: 10.1038/nphoton.2015.182

  12. [12]

    David Wright, Harish Bhaskaran, and Wolfram H

    Johannes Feldmann, Nathan Youngblood, C. David Wright, Harish Bhaskaran, and Wolfram H. P. Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities.Nature, 569:208–214, 2019. doi: 10.1038/s41586-019-1157-8

  13. [13]

    Raja, Junqiu Liu, C

    JohannesFeldmann, NathanYoungblood, MaximKarpov, HelgeGehring, XuanLi, MaikStappers, Manuel Le Gallo, Xin Fu, Anton Lukashchuk, Arslan S. Raja, Junqiu Liu, C. David Wright, Abu Sebastian, Tobias J. Kippenberg, Wolfram H. P. Pernice, and Harish Bhaskaran. Parallel convolutional processing using an integrated photonic tensor core.Nature, 589(7840):52–58, 20...

  14. [14]

    Zhongjin Lin, Bhavin J. Shastri, Shangxuan Yu, Jingxiang Song, Yuntao Zhu, Arman Safarne- jadian, Wangning Cai, Yanmei Lin, Wei Ke, Mustafa Hammood, Tianye Wang, Mengyue Xu, Zibo Zheng, Mohammed Al-Qadasi, Omid Esmaeeli, Mohamed Rahim, Grzegorz Pakulski, Jens Schmid, Pedro Barrios, Weihong Jiang, Hugh Morison, Matthew Mitchell, Xun Guan, Nicolas 30 A. F. ...

  15. [15]

    Soriano, Claudio R

    Daniel Brunner, Miguel C. Soriano, Claudio R. Mirasso, and Ingo Fischer. Parallel photonic information processing at gigabyte per second data rates using transient states.Nature Commu- nications, 4:1364, 2013. doi: 10.1038/ncomms2368

  16. [16]

    Luengo-Kovac, Joseph Pilawa, Timothy J

    Upendra Paudel, M. Luengo-Kovac, Joseph Pilawa, Timothy J. Shaw, and George C. Valley. Classification of time-domain waveforms using a speckle-based optical reservoir computer.Optics Express, 28(2):1225–1237, 2020. doi: 10.1364/OE.28.001225

  17. [17]

    Physical Review X10, 041037 (2020) https://doi.org/10.1103/PhysRevX.10.041037

    Mushegh Rafayelyan, Jing Dong, Yiqing Tan, Florent Krzakala, and Sylvain Gigan. Large-scale optical reservoir computing for spatiotemporal chaotic systems prediction.Physical Review X, 10:041037, 2020. doi: 10.1103/PhysRevX.10.041037

  18. [18]

    Hughes, Momchil Minkov, Yu Shi, and Shanhui Fan

    Tyler W. Hughes, Momchil Minkov, Yu Shi, and Shanhui Fan. Training of photonic neural networks through in situ backpropagation and gradient measurement.Optica, 5(7):864–871, 2018. doi: 10.1364/OPTICA.5.000864

  19. [19]

    L2ight: En- ablingon-chiplearningforopticalneuralnetworksviaefficientin-situsubspaceoptimization

    JiaqiGu, HanqingZhu, ChenghaoFeng, ZixuanJiang, RayT.Chen, andDavidZ.Pan. L2ight: En- ablingon-chiplearningforopticalneuralnetworksviaefficientin-situsubspaceoptimization. In Ad- vances in Neural Information Processing Systems , volume 34, 2021. URLhttps://proceedings. neurips.cc/paper/2021/hash/48aedb8880cab8c45637abc7493ecddd-Abstract.html

  20. [20]

    Multimodal deep learning using on-chip diffractive optics with in situ training capability.Nature Communications, 15(1):6189, 2024

    Junwei Cheng, Chaoran Huang, Jialong Zhang, Bo Wu, Wenkai Zhang, Xinyu Liu, Jiahui Zhang, Yiyi Tang, Hailong Zhou, Qiming Zhang, Min Gu, Jianji Dong, and Xinliang Zhang. Multimodal deep learning using on-chip diffractive optics with in situ training capability.Nature Communications, 15(1):6189, 2024. doi: 10.1038/s41467-024-50677-3. URLhttps://doi.org/ 10...

  21. [21]

    Gordon Wetzstein, Aydogan Ozcan, Sylvain Gigan, Shanhui Fan, Dirk Englund, Marin Soljačić, Cornelia Denz, David A. B. Miller, and Demetri Psaltis. Inference in artificial intelligence with deep optics and photonics.Nature, 588:39–47, 2020. doi: 10.1038/s41586-020-2973-6

  22. [22]

    Shastri, Alexander N

    Bhavin J. Shastri, Alexander N. Tait, Thomas Ferreira de Lima, Wolfram H. P. Pernice, Harish Bhaskaran, C. David Wright, and Paul R. Prucnal. Photonics for artificial intelligence and neuromorphic computing. Nature Photonics, 15:102–114, 2021. doi: 10.1038/s41566-020-00754-y

  23. [23]

    Extreme learning machine: theory and applications

    Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learning machine: theory and applications. Neurocomputing, 70(1–3):489–501, 2006. doi: 10.1016/j.neucom.2005.12.126

  24. [24]

    Photonic extreme learning machine by free- space optical propagation

    Davide Pierangeli, Giulia Marcucci, and Claudio Conti. Photonic extreme learning machine by free- space optical propagation. Photonics Research, 9(8):1446–1454, 2021. doi: 10.1364/PRJ.423531

  25. [25]

    Free spoken digit dataset (fsdd)

    Zohar Jackson. Free spoken digit dataset (fsdd). https://github.com/Jakobovski/ free-spoken-digit-dataset, 2017. Accessed: 2026-05-21

  26. [26]

    Davis and Paul Mermelstein

    Steven B. Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980. doi: 10.1109/TASSP.1980.1163420

  27. [27]

    Mushroom dataset.https://archive.ics.uci.edu/ml/ datasets/mushroom,

    UCI Machine Learning Repository. Mushroom dataset.https://archive.ics.uci.edu/ml/ datasets/mushroom, . Accessed: 2026-05-21

  28. [28]

    Abalone dataset

    UCI Machine Learning Repository. Abalone dataset. https://archive.ics.uci.edu/ml/ datasets/abalone, . Accessed: 2026-05-21. 31

  29. [29]

    Bernhard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Ma- chines, Regularization, Optimization, and Beyond . MIT Press, Cambridge, MA, 2002. ISBN 9780262194754

  30. [30]

    Nello Cristianini, John Shawe-Taylor, André Elisseeff, and Jaz S. Kandola. On kernel-target alignment. In Advances in Neural Information Processing Systems , vol- ume 14, 2001. URL https://proceedings.neurips.cc/paper_files/paper/2001/hash/ 1f71e393b3809197ed66df836fe833e5-Abstract.html

  31. [31]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research , pages 3519–3529. PMLR,

  32. [32]

    URL https://proceedings.mlr.press/v97/kornblith19a.html

  33. [33]

    Visualizing data using t-SNE

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research , 9(86):2579–2605, 2008. URL http://jmlr.org/papers/v9/ vandermaaten08a.html. 32