Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning

Chao Luan; Dirk Englund; Ronald Davis III; Ryan Hamerly; Zaijun Chen

arxiv: 2503.24356 · v1 · submitted 2025-03-31 · ⚛️ physics.optics

Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning

Chao Luan , Ronald Davis III , Zaijun Chen , Dirk Englund , Ryan Hamerly This is my paper

Pith reviewed 2026-05-22 22:13 UTC · model grok-4.3

classification ⚛️ physics.optics

keywords optical neural networksmatrix-matrix multiplicationtensor processordeep learningenergy-efficient computinghyper-multiplexingconvolutional neural networks

0 comments

The pith

A spatial-wavelength-temporal hyper-multiplexed architecture performs three-dimensional matrix-matrix multiplication in one optical time step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an optical neural network processor that uses spatial, wavelength, and temporal multiplexing to achieve high data dimensionality and parallelism. It demonstrates this by executing a full three-dimensional matrix-matrix multiplication in a single time step. The hardware accelerates convolutional and deep neural networks, running a benchmark image-recognition task with 292616 weight parameters at 20 attojoules per multiply-accumulate while reaching 96.4 percent accuracy. The design is positioned as scalable for large implementations where earlier high-parallelism optical approaches hit roadblocks.

Core claim

In a single time step, a three-dimensional matrix-matrix multiplication optical tensor processor is demonstrated using a spatial-wavelength-temporal hyper-multiplexed architecture that supports high computing parallelism and remains feasible for large-scale implementation, enabling acceleration of CNNs and DNNs at ultra-low optical energy of 20 attojoules per MAC with 96.4 percent classification accuracy on 292616 parameters.

What carries the argument

The spatial-wavelength-temporal hyper-multiplexed ONN processor architecture that encodes and processes high-dimensional data across space, spectrum, and time to perform parallel matrix multiplications.

If this is right

CNNs and DNNs can be accelerated directly through parallel optical matrix multiplication.
Image recognition runs at 96.4 percent accuracy using 292616 optical weights.
Energy consumption reaches 20 attojoules per multiply-accumulate operation.
Broad spectral and spatial bandwidths become available for larger demonstrations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the single-shot property holds at scale, inference latency could drop to the physical propagation time of light through the device.
The approach may extend to other tensor contractions beyond matrix multiplication by reusing the same multiplexing dimensions.
Hybrid systems could combine this optical front end with electronic backpropagation for end-to-end training at reduced energy cost.

Load-bearing premise

The hyper-multiplexed design avoids the scaling roadblocks that limited earlier high-parallelism optical neural networks when built at large size.

What would settle it

A working demonstration of the same architecture at substantially higher parameter count or dimensionality while preserving the reported energy per MAC and accuracy would support the scalability claim; inability to maintain performance at larger scales would refute it.

Figures

Figures reproduced from arXiv: 2503.24356 by Chao Luan, Dirk Englund, Ronald Davis III, Ryan Hamerly, Zaijun Chen.

**Figure 1.** Figure 1: Overview and working principle of the single-shot MMM optical tensor processor. (a) Overview of the proposed high [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Architecture of the ONN processor and characterization results of the parallel dispersive grating beam routing. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental setup and verification of the signal-shot [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Parallel convolution operation using the architecture. (a)-(d) A 3D convolution of a colored image with four different [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Using parallel MMM optical tensor processor to perform benchmark MNIST image classification. (a) The architecture of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of SoA computing systems [13, 21, 23, [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Bandwidth of the modulator. The figure illustrates the frequency response of the modulator, demonstrating E/O [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Power transfer function of the modulator. The green line illustrates the D.C. response of the modulator output power [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Power transfer function of different modulators. All these measured E/O modulator transfer functions are different and [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Perform pulse amplitude modulation of the AWG pattern, the top figure shows the pattern after the pulse amplitude [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Pulse train of the calculated and measured signal at [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Experiment-theory error standard-deviation at different data sampling rates. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: E/O modulator quadrature-point drift measurements. The output power is very stable when the D.C. bias voltage [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Parallelism measurement of the grating beam routing architecture, the white dot represents the light incident spatial [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

**Figure 15.** Figure 15: Crosstalk measurement of the grating beam routing system, the crosstalk between different channels are -50 dB. [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: Architecture setup when the grating beam routing system works as Mux. [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 17.** Figure 17: Experimental results of the grating beam routing Mux. [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗

**Figure 18.** Figure 18: Architecture setup when the grating beam routing system works as DeMux. [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗

**Figure 19.** Figure 19: Experimental results of the grating beam routing DeMux. [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗

**Figure 20.** Figure 20: In this figure, the shorter wavelength is focused in the center of the receiver fiber array, where another wavelength [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗

**Figure 21.** Figure 21: Focal lens shift of different wavelength lights. [PITH_FULL_IMAGE:figures/full_fig_p022_21.png] view at source ↗

**Figure 22.** Figure 22: Spot size change caused by the chromatic aberration. [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗

**Figure 23.** Figure 23: Schematic setup of the grating beam routing system using commercial WDM. [PITH_FULL_IMAGE:figures/full_fig_p023_23.png] view at source ↗

**Figure 24.** Figure 24: Concept of using chip scale modulators and fiber arrays. [PITH_FULL_IMAGE:figures/full_fig_p024_24.png] view at source ↗

**Figure 25.** Figure 25: Effect of grating coupler angle and beam travel on the position of the return beam. [PITH_FULL_IMAGE:figures/full_fig_p024_25.png] view at source ↗

**Figure 26.** Figure 26: Convolution setup of mapping two 2 ∗ 2 kernels into the optical hardware, eight kernel modulators and eight data modulators are used to encode the kernel and the data. The image data matrix needs to be processed before it could be mapped to the data modulators, [PITH_FULL_IMAGE:figures/full_fig_p027_26.png] view at source ↗

**Figure 27.** Figure 27: Processing the data matrix for 2∗2 kernel convolution, the data matrix is processed and mapped to four data modulators. Using the analog time integrator, we can map larger kernels into the optical hardware [PITH_FULL_IMAGE:figures/full_fig_p027_27.png] view at source ↗

**Figure 28.** Figure 28: Convolution setup of mapping two 4 ∗ 4 kernels into the optical hardware, 4 modulators encodes the 4 ∗ 4 kernel in 4 time steps [PITH_FULL_IMAGE:figures/full_fig_p027_28.png] view at source ↗

**Figure 29.** Figure 29: Processing the data matrix for 4∗4 kernel convolution, the data matrix is processed and mapped to four data modulators in 4-time steps [PITH_FULL_IMAGE:figures/full_fig_p028_29.png] view at source ↗

**Figure 30.** Figure 30: The convolution results for the 4 ∗ 4 kernel with image matrix using analog time integrator, the top 4 figures are the convolution images using the optical hardware, where the bottom figures are digital convolution results [PITH_FULL_IMAGE:figures/full_fig_p028_30.png] view at source ↗

**Figure 31.** Figure 31: The convolution setup for the 2 ∗ 2 kernel with image matrix using waveshaper [PITH_FULL_IMAGE:figures/full_fig_p029_31.png] view at source ↗

**Figure 32.** Figure 32: The convolution results for the 2 ∗ 2 kernel with image matrix using waveshaper [PITH_FULL_IMAGE:figures/full_fig_p029_32.png] view at source ↗

**Figure 33.** Figure 33: The convolution results for the 2 ∗ 2 kernel with image matrix using waveshaper. VI. CNNS AND DNNS FOR IMAGE CLASSIFICATION A. Subsection 1: Digital training of the network The CNN and FC NN were trained on a standard digital computer in Python with the PyTorch library on 50,000 training images for the MNIST and Fashion-MNIST datasets. 10,000 images were reserved for validation sets to finetune the netwo… view at source ↗

**Figure 34.** Figure 34: Figure of the builded grating optical neural network setup. [PITH_FULL_IMAGE:figures/full_fig_p033_34.png] view at source ↗

read the original abstract

The ever-increasing data demand craves advancements in high-speed and energy-efficient computing hardware. Analog optical neural network (ONN) processors have emerged as a promising solution, offering benefits in bandwidth and energy consumption. However, existing ONN processors exhibit limited computational parallelism, and while certain architectures achieve high parallelism, they encounter serious scaling roadblocks for large-scale implementation. This restricts the throughput, latency, and energy efficiency advantages of ONN processors. Here, we introduce a spatial-wavelength-temporal hyper-multiplexed ONN processor that supports high data dimensionality, high computing parallelism and is feasible for large-scale implementation, and in a single time step, a three-dimensional matrix-matrix multiplication (MMM) optical tensor processor is demonstrated. Our hardware accelerates convolutional neural networks (CNNs) and deep neural networks (DNNs) through parallel matrix multiplication. We demonstrate benchmark image recognition using a CNN and a subsequently fully connected DNN in the optical domain. The network works with 292,616 weight parameters under ultra-low optical energy of 20 attojoules (aJ) per multiply and accumulate (MAC) at 96.4% classification accuracy. The system supports broad spectral and spatial bandwidths and is capable for large-scale demonstration, paving the way for highly efficient large-scale optical computing for next-generation deep learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They show a hyper-multiplexed optical setup doing single-shot 3D matrix-matrix multiplies on a CNN+DNN at 20 aJ/MAC and 96.4% accuracy, but the scaling story rests on unshown crosstalk and loss data.

read the letter

The main point is that this paper demonstrates an optical tensor processor using spatial-wavelength-temporal multiplexing to compute a full 3D matrix-matrix product in one time step, then runs it on image classification with a CNN followed by a DNN using 292k weights. They report 96.4% accuracy at 20 attojoules per MAC. That combination of single-shot 3D operation and the energy number is what stands out from the abstract and the stress-test note. The architecture is presented as a way around the scaling limits that hit earlier high-parallelism ONN designs, and they frame the result as an experimental hardware demonstration rather than a simulation or derivation. The fact that they close the loop on a real benchmark task gives it more weight than pure linear-algebra optics papers. The numbers themselves look aggressive but are stated plainly, which makes them easy to check against future work. The soft spot is exactly the one the stress-test flags: the claim that the hyper-multiplexed channels stay clean enough for the reported accuracy depends on low crosstalk, spectral overlap, and insertion loss across the modes used. The abstract gives no count of wavelength or spatial channels and no fidelity or error-rate curves versus channel number, so it is not possible to judge whether the parallelism actually scales without precision loss. If the full paper includes those measurements and shows they remain acceptable at the demonstrated size, the central result holds; if not, the energy-accuracy claim is harder to trust for larger systems. This work is aimed at people building or evaluating optical accelerators for deep learning. A reader already following ONN hardware will get concrete architecture details and benchmark numbers to compare against other approaches. It is worth sending to peer review because it ships a new multiplexing scheme plus end-to-end task results with specific performance figures; the referee process can sort out the missing scaling data and any experimental methods details.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a spatial-wavelength-temporal hyper-multiplexed optical neural network processor that performs three-dimensional matrix-matrix multiplication in a single time step. It reports acceleration of CNNs and DNNs for image recognition using 292616 weight parameters at 20 aJ/MAC optical energy and 96.4% classification accuracy, claiming feasibility for large-scale implementation unlike prior high-parallelism ONN designs.

Significance. If the experimental demonstration and scaling claims hold, the work would represent a notable advance in analog optical computing hardware by enabling high-dimensionality, single-shot tensor operations with ultra-low energy per MAC while addressing parallelism scaling barriers.

major comments (2)

[Abstract] Abstract: The central claim of single-shot 3D MMM with 292616 weights at 96.4% accuracy requires that inter-channel crosstalk, spectral overlap, and insertion loss remain low enough to preserve effective MAC precision. No number of wavelength or spatial modes is stated, nor is any measured fidelity or error-rate data versus channel count supplied, leaving the scaling assumption for the hyper-multiplexed architecture unsupported.
[Abstract] Abstract: The 20 aJ/MAC figure and 96.4% accuracy are presented as experimental results, yet the text supplies neither experimental methods, error bars, raw data, nor verification details for these quantities, preventing evaluation of the hardware claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments and the opportunity to clarify aspects of our work. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of single-shot 3D MMM with 292616 weights at 96.4% accuracy requires that inter-channel crosstalk, spectral overlap, and insertion loss remain low enough to preserve effective MAC precision. No number of wavelength or spatial modes is stated, nor is any measured fidelity or error-rate data versus channel count supplied, leaving the scaling assumption for the hyper-multiplexed architecture unsupported.

Authors: The abstract summarizes the demonstrated single-shot 3D MMM but does not enumerate the specific wavelength and spatial mode counts or include channel-count-dependent fidelity metrics. The full manuscript provides these details in the architecture description and experimental characterization sections, where measured crosstalk, spectral overlap, and insertion loss values are reported for the implemented channel count and shown to support the observed MAC precision. We will revise the abstract to state the number of modes employed and note that fidelity data versus channel count appear in the main text. revision: yes
Referee: [Abstract] Abstract: The 20 aJ/MAC figure and 96.4% accuracy are presented as experimental results, yet the text supplies neither experimental methods, error bars, raw data, nor verification details for these quantities, preventing evaluation of the hardware claim.

Authors: The 20 aJ/MAC and 96.4% accuracy values are obtained from the experimental demonstration described in the manuscript. The methods section, supplementary information, and figure captions supply the measurement procedures, verification approach, and any associated uncertainty quantification. The abstract condenses these results; we will revise it to explicitly reference the supporting experimental details in the main text. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental hardware demonstration with no derivation chain

full rationale

The paper reports an experimental demonstration of a spatial-wavelength-temporal hyper-multiplexed optical neural network processor performing single-shot 3D matrix-matrix multiplication. Central claims rest on measured classification accuracy (96.4%), energy per MAC (20 aJ), and parameter count (292616) from hardware benchmarks on CNN+DNN tasks. No mathematical derivation, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems are present in the provided text. The architecture description and performance metrics are independent experimental results, not reductions to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no free parameters, axioms, or invented entities; all ledger fields left empty.

pith-pipeline@v0.9.0 · 5771 in / 1098 out tokens · 57218 ms · 2026-05-22T22:13:17.257981+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Nature521, 436 (2015)

work page 2015
[2]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Advances in neural information processing systems30 (2017)

work page 2017
[3]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, inAdvances in Neural Information Processing Systems, Vol. 25, edited by F. Pereira, C. Burges, L. Bottou, and K. Weinberger (Curran Associates, Inc., 2012)

work page 2012
[4]

K. He, X. Zhang, S. Ren, and J. Sun, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

work page 2016
[5]

I. H. Sarker, Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions (2021)

work page 2021
[6]

W. He, J. Li, X. Kong, and L. Deng, Communications Engineering3, 10.1038/s44172-024-00303-3 (2024)

work page doi:10.1038/s44172-024-00303-3 2024
[7]

Carleo, I

G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Rev. Mod. Phys.91, 045002 (2019)

work page 2019
[8]

Ching, D

T. Ching, D. S. Himmelstein, B. K. Beaulieu-Jones, A. A. Kalinin, B. T. Do, G. P. Way, E. Ferrero, P. M. Agapow, M. Zietz, M. M. Hoffman, W. Xie, G. L. Rosen, B. J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A. E. Carpenter, A. Shrikumar, J. Xu, E. M. Cofer, C. A. Lavender, S. C. Turaga, A. M. Alexandari, Z. Lu, D. J. Harris, D. Decaprio, Y. Qi,...

work page doi:10.1098/rsif.2017.0387 2017
[9]

Goecks, V

J. Goecks, V. Jalili, L. M. Heiser, and J. W. Gray, Cell181, 92 (2020)

work page 2020
[10]

Integrated silicon photonics: Harnessing the data explosion (2011)

work page 2011
[11]

Heuser, M

T. Heuser, M. Pflüger, I. Fischer, J. A. Lott, D. Brunner, and S. Reitzenstein, Journal of Physics: Photonics2, 044002 (2020)

work page 2020
[12]

Yazdanbakhsh, K

A. Yazdanbakhsh, K. Seshadri, B. Akin, J. Laudon, and R. Narayanaswami, arXiv preprint arXiv:2102.10423 (2021). 34

work page arXiv 2021
[13]

Kumar, V

S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang,et al., arXiv preprint arXiv:1909.09756 (2019)

work page arXiv 1909
[14]

Shawahna, S

A. Shawahna, S. M. Sait, and A. El-Maleh, IEEE Access7, 7823 (2019)

work page 2019
[15]

C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, and X. Zhou, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems36, 513 (2017)

work page 2017
[16]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, IEEE Micro31, 7 (2011)

work page 2011
[17]

Pandey, M

M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov, The transformational role of gpu computing and deep learning in drug discovery (2022)

work page 2022
[18]

Ankit, I

A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Roy, and D. S. Milojicic, in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19 (Association for Computing Machinery, New York, NY, USA,

work page
[19]

P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, and H. Qian, Nature577, 641 (2020)

work page 2020
[20]

D. A. Miller, Journal of Lightwave Technology35, 346 (2017)

work page 2017
[21]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund,et al., Nature Photonics11, 441 (2017)

work page 2017
[22]

Hamerly, L

R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, Physical Review X9, 021032 (2019)

work page 2019
[23]

X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti,et al., Nature 589, 44 (2021)

work page 2021
[24]

Wang, S.-Y

T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. C. Richard, and P. L. McMahon, Nature Communications13, 1 (2022)

work page 2022
[25]

Hamerly, A

R. Hamerly, A. Sludds, S. Bandyopadhyay, L. Bernstein, Z. Chen, M. Ghobadi, and D. Englund, inEmerging Topics in Artificial Intelligence (ETAI) 2021, Vol. 11804 (International Society for Optics and Photonics, 2021) p. 118041R

work page 2021
[26]

H. Zhu, J. Zou, H. Zhang, Y. Shi, S. Luo, N. Wang, H. Cai, L. Wan, B. Wang, X. Jiang,et al., Nature Communications13, 1 (2022)

work page 2022
[27]

Feldmann, N

J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, et al., Nature589, 52 (2021)

work page 2021
[28]

Feldmann, N

J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, Nature569, 208 (2019)

work page 2019
[29]

N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, Applied optics24, 1469 (1985)

work page 1985
[30]

Ashtiani, A

F. Ashtiani, A. J. Geers, and F. Aflatouni, Nature606, 501 (2022)

work page 2022
[31]

Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen, G.-B. Jo, J. Liu, and S. Du, Optica6, 1132 (2019)

work page 2019
[32]

A. N. Tait, T. F. De Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, Scientific reports7, 1 (2017)

work page 2017
[33]

B. Shi, N. Calabretta, and R. Stabile, IEEE Journal of Selected Topics in Quantum Electronics26, 1 (2019)

work page 2019
[34]

Pappas, T

C. Pappas, T. Moschos, A. Prapas, M. Kirtas, M. Moralis-Pegios, A. Tsakyridis, O. Asimopoulos, N. Passalis, A. Tefas, and N. Pleros, Journal of Lightwave Technology (2025)

work page 2025
[35]

A. N. Tait, T. F. De Lima, M. A. Nahmias, H. B. Miller, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, Physical Review Applied 11, 064043 (2019)

work page 2019
[36]

Huang, A

C. Huang, A. Jha, T. F. De Lima, A. N. Tait, B. J. Shastri, and P. R. Prucnal, IEEE Journal of Selected Topics in Quantum Electronics 27, 1 (2020)

work page 2020
[37]

B. Dong, S. Aggarwal, W. Zhou, U. E. Ali, N. Farmakidis, J. S. Lee, Y. He, X. Li, D.-L. Kwong, C. Wright,et al., Nature Photonics 17, 1080 (2023)

work page 2023
[38]

Sludds, S

A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, J. Cochrane, L. Bernstein, D. Bunandar, P. B. Dixon, S. A. Hamilton, M. Streshinsky, A. Novack, T. Baehr-Jones, M. Hochberg, M. Ghobadi, R. Hamerly, and D. Englund, Science378, 270 (2022)

work page 2022
[39]

Hamerly, A

R. Hamerly, A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, L. Bernstein, and D. Englund, Journal of Lightwave Technology 42, 7795 (2024)

work page 2024
[40]

Z. Chen, A. Sludds, R. Davis III, I. Christen, L. Bernstein, L. Ateshian, T. Heuser, N. Heermeier, J. A. Lott, S. Reitzenstein, et al., Nature Photonics17, 723 (2023)

work page 2023
[41]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers,et al., in Proceedings of the 44th annual international symposium on computer architecture(2017) pp. 1–12

work page 2017
[42]

S. Xu, J. Wang, S. Yi, and W. Zou, Nature communications13, 7970 (2022)

work page 2022
[43]

Novikov, D

A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, Advances in neural information processing systems28 (2015)

work page 2015
[44]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., Advances in neural information processing systems32 (2019)

work page 2019
[45]

Sze, Y.-H

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, Proceedings of the IEEE105, 2295 (2017)

work page 2017
[46]

Sze, Y.-H

V. Sze, Y.-H. Chen, J. Emer, A. Suleiman, and Z. Zhang, in2017 IEEE Custom Integrated Circuits Conference (CICC)(2017) pp. 1–8

work page 2017
[47]

Moazeni, S

S. Moazeni, S. Lin, M. Wade, L. Alloatti, R. J. Ram, M. Popović, and V. Stojanović, IEEE Journal of Solid-State Circuits 52, 3503 (2017)

work page 2017
[48]

Dietrich, M

P.-I. Dietrich, M. Blaicher, I. Reuter, M. Billah, T. Hoose, A. Hofmann, C. Caer, R. Dangel, B. Offrein, U. Troppenz,et al., Nature Photonics12, 241 (2018)

work page 2018
[49]

Aalto, M

T. Aalto, M. Cherchi, M. Harjanne, S. Bhat, P. Heimala, F. Sun, M. Kapulainen, T. Hassinen, and T. Vehmas, IEEE Journal of selected topics in quantum electronics25, 1 (2019)

work page 2019

[1] [1]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Nature521, 436 (2015)

work page 2015

[2] [2]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Advances in neural information processing systems30 (2017)

work page 2017

[3] [3]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, inAdvances in Neural Information Processing Systems, Vol. 25, edited by F. Pereira, C. Burges, L. Bottou, and K. Weinberger (Curran Associates, Inc., 2012)

work page 2012

[4] [4]

K. He, X. Zhang, S. Ren, and J. Sun, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

work page 2016

[5] [5]

I. H. Sarker, Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions (2021)

work page 2021

[6] [6]

W. He, J. Li, X. Kong, and L. Deng, Communications Engineering3, 10.1038/s44172-024-00303-3 (2024)

work page doi:10.1038/s44172-024-00303-3 2024

[7] [7]

Carleo, I

G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Rev. Mod. Phys.91, 045002 (2019)

work page 2019

[8] [8]

Ching, D

T. Ching, D. S. Himmelstein, B. K. Beaulieu-Jones, A. A. Kalinin, B. T. Do, G. P. Way, E. Ferrero, P. M. Agapow, M. Zietz, M. M. Hoffman, W. Xie, G. L. Rosen, B. J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A. E. Carpenter, A. Shrikumar, J. Xu, E. M. Cofer, C. A. Lavender, S. C. Turaga, A. M. Alexandari, Z. Lu, D. J. Harris, D. Decaprio, Y. Qi,...

work page doi:10.1098/rsif.2017.0387 2017

[9] [9]

Goecks, V

J. Goecks, V. Jalili, L. M. Heiser, and J. W. Gray, Cell181, 92 (2020)

work page 2020

[10] [10]

Integrated silicon photonics: Harnessing the data explosion (2011)

work page 2011

[11] [11]

Heuser, M

T. Heuser, M. Pflüger, I. Fischer, J. A. Lott, D. Brunner, and S. Reitzenstein, Journal of Physics: Photonics2, 044002 (2020)

work page 2020

[12] [12]

Yazdanbakhsh, K

A. Yazdanbakhsh, K. Seshadri, B. Akin, J. Laudon, and R. Narayanaswami, arXiv preprint arXiv:2102.10423 (2021). 34

work page arXiv 2021

[13] [13]

Kumar, V

S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang,et al., arXiv preprint arXiv:1909.09756 (2019)

work page arXiv 1909

[14] [14]

Shawahna, S

A. Shawahna, S. M. Sait, and A. El-Maleh, IEEE Access7, 7823 (2019)

work page 2019

[15] [15]

C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, and X. Zhou, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems36, 513 (2017)

work page 2017

[16] [16]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, IEEE Micro31, 7 (2011)

work page 2011

[17] [17]

Pandey, M

M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov, The transformational role of gpu computing and deep learning in drug discovery (2022)

work page 2022

[18] [18]

Ankit, I

A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Roy, and D. S. Milojicic, in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19 (Association for Computing Machinery, New York, NY, USA,

work page

[19] [19]

P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, and H. Qian, Nature577, 641 (2020)

work page 2020

[20] [20]

D. A. Miller, Journal of Lightwave Technology35, 346 (2017)

work page 2017

[21] [21]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund,et al., Nature Photonics11, 441 (2017)

work page 2017

[22] [22]

Hamerly, L

R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, Physical Review X9, 021032 (2019)

work page 2019

[23] [23]

X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti,et al., Nature 589, 44 (2021)

work page 2021

[24] [24]

Wang, S.-Y

T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. C. Richard, and P. L. McMahon, Nature Communications13, 1 (2022)

work page 2022

[25] [25]

Hamerly, A

R. Hamerly, A. Sludds, S. Bandyopadhyay, L. Bernstein, Z. Chen, M. Ghobadi, and D. Englund, inEmerging Topics in Artificial Intelligence (ETAI) 2021, Vol. 11804 (International Society for Optics and Photonics, 2021) p. 118041R

work page 2021

[26] [26]

H. Zhu, J. Zou, H. Zhang, Y. Shi, S. Luo, N. Wang, H. Cai, L. Wan, B. Wang, X. Jiang,et al., Nature Communications13, 1 (2022)

work page 2022

[27] [27]

Feldmann, N

J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, et al., Nature589, 52 (2021)

work page 2021

[28] [28]

Feldmann, N

J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, Nature569, 208 (2019)

work page 2019

[29] [29]

N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, Applied optics24, 1469 (1985)

work page 1985

[30] [30]

Ashtiani, A

F. Ashtiani, A. J. Geers, and F. Aflatouni, Nature606, 501 (2022)

work page 2022

[31] [31]

Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen, G.-B. Jo, J. Liu, and S. Du, Optica6, 1132 (2019)

work page 2019

[32] [32]

A. N. Tait, T. F. De Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, Scientific reports7, 1 (2017)

work page 2017

[33] [33]

B. Shi, N. Calabretta, and R. Stabile, IEEE Journal of Selected Topics in Quantum Electronics26, 1 (2019)

work page 2019

[34] [34]

Pappas, T

C. Pappas, T. Moschos, A. Prapas, M. Kirtas, M. Moralis-Pegios, A. Tsakyridis, O. Asimopoulos, N. Passalis, A. Tefas, and N. Pleros, Journal of Lightwave Technology (2025)

work page 2025

[35] [35]

A. N. Tait, T. F. De Lima, M. A. Nahmias, H. B. Miller, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, Physical Review Applied 11, 064043 (2019)

work page 2019

[36] [36]

Huang, A

C. Huang, A. Jha, T. F. De Lima, A. N. Tait, B. J. Shastri, and P. R. Prucnal, IEEE Journal of Selected Topics in Quantum Electronics 27, 1 (2020)

work page 2020

[37] [37]

B. Dong, S. Aggarwal, W. Zhou, U. E. Ali, N. Farmakidis, J. S. Lee, Y. He, X. Li, D.-L. Kwong, C. Wright,et al., Nature Photonics 17, 1080 (2023)

work page 2023

[38] [38]

Sludds, S

A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, J. Cochrane, L. Bernstein, D. Bunandar, P. B. Dixon, S. A. Hamilton, M. Streshinsky, A. Novack, T. Baehr-Jones, M. Hochberg, M. Ghobadi, R. Hamerly, and D. Englund, Science378, 270 (2022)

work page 2022

[39] [39]

Hamerly, A

R. Hamerly, A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, L. Bernstein, and D. Englund, Journal of Lightwave Technology 42, 7795 (2024)

work page 2024

[40] [40]

Z. Chen, A. Sludds, R. Davis III, I. Christen, L. Bernstein, L. Ateshian, T. Heuser, N. Heermeier, J. A. Lott, S. Reitzenstein, et al., Nature Photonics17, 723 (2023)

work page 2023

[41] [41]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers,et al., in Proceedings of the 44th annual international symposium on computer architecture(2017) pp. 1–12

work page 2017

[42] [42]

S. Xu, J. Wang, S. Yi, and W. Zou, Nature communications13, 7970 (2022)

work page 2022

[43] [43]

Novikov, D

A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, Advances in neural information processing systems28 (2015)

work page 2015

[44] [44]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., Advances in neural information processing systems32 (2019)

work page 2019

[45] [45]

Sze, Y.-H

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, Proceedings of the IEEE105, 2295 (2017)

work page 2017

[46] [46]

Sze, Y.-H

V. Sze, Y.-H. Chen, J. Emer, A. Suleiman, and Z. Zhang, in2017 IEEE Custom Integrated Circuits Conference (CICC)(2017) pp. 1–8

work page 2017

[47] [47]

Moazeni, S

S. Moazeni, S. Lin, M. Wade, L. Alloatti, R. J. Ram, M. Popović, and V. Stojanović, IEEE Journal of Solid-State Circuits 52, 3503 (2017)

work page 2017

[48] [48]

Dietrich, M

P.-I. Dietrich, M. Blaicher, I. Reuter, M. Billah, T. Hoose, A. Hofmann, C. Caer, R. Dangel, B. Offrein, U. Troppenz,et al., Nature Photonics12, 241 (2018)

work page 2018

[49] [49]

Aalto, M

T. Aalto, M. Cherchi, M. Harjanne, S. Bhat, P. Heimala, F. Sun, M. Kapulainen, T. Hassinen, and T. Vehmas, IEEE Journal of selected topics in quantum electronics25, 1 (2019)

work page 2019