pith. sign in

arxiv: 2503.24356 · v1 · submitted 2025-03-31 · ⚛️ physics.optics

Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning

Pith reviewed 2026-05-22 22:13 UTC · model grok-4.3

classification ⚛️ physics.optics
keywords optical neural networksmatrix-matrix multiplicationtensor processordeep learningenergy-efficient computinghyper-multiplexingconvolutional neural networks
0
0 comments X

The pith

A spatial-wavelength-temporal hyper-multiplexed architecture performs three-dimensional matrix-matrix multiplication in one optical time step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an optical neural network processor that uses spatial, wavelength, and temporal multiplexing to achieve high data dimensionality and parallelism. It demonstrates this by executing a full three-dimensional matrix-matrix multiplication in a single time step. The hardware accelerates convolutional and deep neural networks, running a benchmark image-recognition task with 292616 weight parameters at 20 attojoules per multiply-accumulate while reaching 96.4 percent accuracy. The design is positioned as scalable for large implementations where earlier high-parallelism optical approaches hit roadblocks.

Core claim

In a single time step, a three-dimensional matrix-matrix multiplication optical tensor processor is demonstrated using a spatial-wavelength-temporal hyper-multiplexed architecture that supports high computing parallelism and remains feasible for large-scale implementation, enabling acceleration of CNNs and DNNs at ultra-low optical energy of 20 attojoules per MAC with 96.4 percent classification accuracy on 292616 parameters.

What carries the argument

The spatial-wavelength-temporal hyper-multiplexed ONN processor architecture that encodes and processes high-dimensional data across space, spectrum, and time to perform parallel matrix multiplications.

If this is right

  • CNNs and DNNs can be accelerated directly through parallel optical matrix multiplication.
  • Image recognition runs at 96.4 percent accuracy using 292616 optical weights.
  • Energy consumption reaches 20 attojoules per multiply-accumulate operation.
  • Broad spectral and spatial bandwidths become available for larger demonstrations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the single-shot property holds at scale, inference latency could drop to the physical propagation time of light through the device.
  • The approach may extend to other tensor contractions beyond matrix multiplication by reusing the same multiplexing dimensions.
  • Hybrid systems could combine this optical front end with electronic backpropagation for end-to-end training at reduced energy cost.

Load-bearing premise

The hyper-multiplexed design avoids the scaling roadblocks that limited earlier high-parallelism optical neural networks when built at large size.

What would settle it

A working demonstration of the same architecture at substantially higher parameter count or dimensionality while preserving the reported energy per MAC and accuracy would support the scalability claim; inability to maintain performance at larger scales would refute it.

Figures

Figures reproduced from arXiv: 2503.24356 by Chao Luan, Dirk Englund, Ronald Davis III, Ryan Hamerly, Zaijun Chen.

Figure 1
Figure 1. Figure 1: Overview and working principle of the single-shot MMM optical tensor processor. (a) Overview of the proposed high [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the ONN processor and characterization results of the parallel dispersive grating beam routing. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental setup and verification of the signal-shot [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Parallel convolution operation using the architecture. (a)-(d) A 3D convolution of a colored image with four different [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Using parallel MMM optical tensor processor to perform benchmark MNIST image classification. (a) The architecture of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of SoA computing systems [13, 21, 23, [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Bandwidth of the modulator. The figure illustrates the frequency response of the modulator, demonstrating E/O [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Power transfer function of the modulator. The green line illustrates the D.C. response of the modulator output power [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Power transfer function of different modulators. All these measured E/O modulator transfer functions are different and [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Perform pulse amplitude modulation of the AWG pattern, the top figure shows the pattern after the pulse amplitude [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Pulse train of the calculated and measured signal at [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Experiment-theory error standard-deviation at different data sampling rates. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: E/O modulator quadrature-point drift measurements. The output power is very stable when the D.C. bias voltage [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Parallelism measurement of the grating beam routing architecture, the white dot represents the light incident spatial [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Crosstalk measurement of the grating beam routing system, the crosstalk between different channels are -50 dB. [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Architecture setup when the grating beam routing system works as Mux. [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Experimental results of the grating beam routing Mux. [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Architecture setup when the grating beam routing system works as DeMux. [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Experimental results of the grating beam routing DeMux. [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: In this figure, the shorter wavelength is focused in the center of the receiver fiber array, where another wavelength [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Focal lens shift of different wavelength lights. [PITH_FULL_IMAGE:figures/full_fig_p022_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Spot size change caused by the chromatic aberration. [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Schematic setup of the grating beam routing system using commercial WDM. [PITH_FULL_IMAGE:figures/full_fig_p023_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Concept of using chip scale modulators and fiber arrays. [PITH_FULL_IMAGE:figures/full_fig_p024_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Effect of grating coupler angle and beam travel on the position of the return beam. [PITH_FULL_IMAGE:figures/full_fig_p024_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Convolution setup of mapping two 2 ∗ 2 kernels into the optical hardware, eight kernel modulators and eight data modulators are used to encode the kernel and the data. The image data matrix needs to be processed before it could be mapped to the data modulators, [PITH_FULL_IMAGE:figures/full_fig_p027_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Processing the data matrix for 2∗2 kernel convolution, the data matrix is processed and mapped to four data modulators. Using the analog time integrator, we can map larger kernels into the optical hardware [PITH_FULL_IMAGE:figures/full_fig_p027_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Convolution setup of mapping two 4 ∗ 4 kernels into the optical hardware, 4 modulators encodes the 4 ∗ 4 kernel in 4 time steps [PITH_FULL_IMAGE:figures/full_fig_p027_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Processing the data matrix for 4∗4 kernel convolution, the data matrix is processed and mapped to four data modulators in 4-time steps [PITH_FULL_IMAGE:figures/full_fig_p028_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: The convolution results for the 4 ∗ 4 kernel with image matrix using analog time integrator, the top 4 figures are the convolution images using the optical hardware, where the bottom figures are digital convolution results [PITH_FULL_IMAGE:figures/full_fig_p028_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: The convolution setup for the 2 ∗ 2 kernel with image matrix using waveshaper [PITH_FULL_IMAGE:figures/full_fig_p029_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: The convolution results for the 2 ∗ 2 kernel with image matrix using waveshaper [PITH_FULL_IMAGE:figures/full_fig_p029_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: The convolution results for the 2 ∗ 2 kernel with image matrix using waveshaper. VI. CNNS AND DNNS FOR IMAGE CLASSIFICATION A. Subsection 1: Digital training of the network The CNN and FC NN were trained on a standard digital computer in Python with the PyTorch library on 50,000 training images for the MNIST and Fashion-MNIST datasets. 10,000 images were reserved for validation sets to fine￾tune the netwo… view at source ↗
Figure 34
Figure 34. Figure 34: Figure of the builded grating optical neural network setup. [PITH_FULL_IMAGE:figures/full_fig_p033_34.png] view at source ↗
read the original abstract

The ever-increasing data demand craves advancements in high-speed and energy-efficient computing hardware. Analog optical neural network (ONN) processors have emerged as a promising solution, offering benefits in bandwidth and energy consumption. However, existing ONN processors exhibit limited computational parallelism, and while certain architectures achieve high parallelism, they encounter serious scaling roadblocks for large-scale implementation. This restricts the throughput, latency, and energy efficiency advantages of ONN processors. Here, we introduce a spatial-wavelength-temporal hyper-multiplexed ONN processor that supports high data dimensionality, high computing parallelism and is feasible for large-scale implementation, and in a single time step, a three-dimensional matrix-matrix multiplication (MMM) optical tensor processor is demonstrated. Our hardware accelerates convolutional neural networks (CNNs) and deep neural networks (DNNs) through parallel matrix multiplication. We demonstrate benchmark image recognition using a CNN and a subsequently fully connected DNN in the optical domain. The network works with 292,616 weight parameters under ultra-low optical energy of 20 attojoules (aJ) per multiply and accumulate (MAC) at 96.4% classification accuracy. The system supports broad spectral and spatial bandwidths and is capable for large-scale demonstration, paving the way for highly efficient large-scale optical computing for next-generation deep learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a spatial-wavelength-temporal hyper-multiplexed optical neural network processor that performs three-dimensional matrix-matrix multiplication in a single time step. It reports acceleration of CNNs and DNNs for image recognition using 292616 weight parameters at 20 aJ/MAC optical energy and 96.4% classification accuracy, claiming feasibility for large-scale implementation unlike prior high-parallelism ONN designs.

Significance. If the experimental demonstration and scaling claims hold, the work would represent a notable advance in analog optical computing hardware by enabling high-dimensionality, single-shot tensor operations with ultra-low energy per MAC while addressing parallelism scaling barriers.

major comments (2)
  1. [Abstract] Abstract: The central claim of single-shot 3D MMM with 292616 weights at 96.4% accuracy requires that inter-channel crosstalk, spectral overlap, and insertion loss remain low enough to preserve effective MAC precision. No number of wavelength or spatial modes is stated, nor is any measured fidelity or error-rate data versus channel count supplied, leaving the scaling assumption for the hyper-multiplexed architecture unsupported.
  2. [Abstract] Abstract: The 20 aJ/MAC figure and 96.4% accuracy are presented as experimental results, yet the text supplies neither experimental methods, error bars, raw data, nor verification details for these quantities, preventing evaluation of the hardware claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments and the opportunity to clarify aspects of our work. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of single-shot 3D MMM with 292616 weights at 96.4% accuracy requires that inter-channel crosstalk, spectral overlap, and insertion loss remain low enough to preserve effective MAC precision. No number of wavelength or spatial modes is stated, nor is any measured fidelity or error-rate data versus channel count supplied, leaving the scaling assumption for the hyper-multiplexed architecture unsupported.

    Authors: The abstract summarizes the demonstrated single-shot 3D MMM but does not enumerate the specific wavelength and spatial mode counts or include channel-count-dependent fidelity metrics. The full manuscript provides these details in the architecture description and experimental characterization sections, where measured crosstalk, spectral overlap, and insertion loss values are reported for the implemented channel count and shown to support the observed MAC precision. We will revise the abstract to state the number of modes employed and note that fidelity data versus channel count appear in the main text. revision: yes

  2. Referee: [Abstract] Abstract: The 20 aJ/MAC figure and 96.4% accuracy are presented as experimental results, yet the text supplies neither experimental methods, error bars, raw data, nor verification details for these quantities, preventing evaluation of the hardware claim.

    Authors: The 20 aJ/MAC and 96.4% accuracy values are obtained from the experimental demonstration described in the manuscript. The methods section, supplementary information, and figure captions supply the measurement procedures, verification approach, and any associated uncertainty quantification. The abstract condenses these results; we will revise it to explicitly reference the supporting experimental details in the main text. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental hardware demonstration with no derivation chain

full rationale

The paper reports an experimental demonstration of a spatial-wavelength-temporal hyper-multiplexed optical neural network processor performing single-shot 3D matrix-matrix multiplication. Central claims rest on measured classification accuracy (96.4%), energy per MAC (20 aJ), and parameter count (292616) from hardware benchmarks on CNN+DNN tasks. No mathematical derivation, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems are present in the provided text. The architecture description and performance metrics are independent experimental results, not reductions to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no free parameters, axioms, or invented entities; all ledger fields left empty.

pith-pipeline@v0.9.0 · 5771 in / 1098 out tokens · 57218 ms · 2026-05-22T22:13:17.257981+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    LeCun, Y

    Y. LeCun, Y. Bengio, and G. Hinton, Nature521, 436 (2015)

  2. [2]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Advances in neural information processing systems30 (2017)

  3. [3]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, inAdvances in Neural Information Processing Systems, Vol. 25, edited by F. Pereira, C. Burges, L. Bottou, and K. Weinberger (Curran Associates, Inc., 2012)

  4. [4]

    K. He, X. Zhang, S. Ren, and J. Sun, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  5. [5]

    I. H. Sarker, Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions (2021)

  6. [6]

    W. He, J. Li, X. Kong, and L. Deng, Communications Engineering3, 10.1038/s44172-024-00303-3 (2024)

  7. [7]

    Carleo, I

    G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Rev. Mod. Phys.91, 045002 (2019)

  8. [8]

    Ching, D

    T. Ching, D. S. Himmelstein, B. K. Beaulieu-Jones, A. A. Kalinin, B. T. Do, G. P. Way, E. Ferrero, P. M. Agapow, M. Zietz, M. M. Hoffman, W. Xie, G. L. Rosen, B. J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A. E. Carpenter, A. Shrikumar, J. Xu, E. M. Cofer, C. A. Lavender, S. C. Turaga, A. M. Alexandari, Z. Lu, D. J. Harris, D. Decaprio, Y. Qi,...

  9. [9]

    Goecks, V

    J. Goecks, V. Jalili, L. M. Heiser, and J. W. Gray, Cell181, 92 (2020)

  10. [10]

    Integrated silicon photonics: Harnessing the data explosion (2011)

  11. [11]

    Heuser, M

    T. Heuser, M. Pflüger, I. Fischer, J. A. Lott, D. Brunner, and S. Reitzenstein, Journal of Physics: Photonics2, 044002 (2020)

  12. [12]

    Yazdanbakhsh, K

    A. Yazdanbakhsh, K. Seshadri, B. Akin, J. Laudon, and R. Narayanaswami, arXiv preprint arXiv:2102.10423 (2021). 34

  13. [13]

    Kumar, V

    S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang,et al., arXiv preprint arXiv:1909.09756 (2019)

  14. [14]

    Shawahna, S

    A. Shawahna, S. M. Sait, and A. El-Maleh, IEEE Access7, 7823 (2019)

  15. [15]

    C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, and X. Zhou, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems36, 513 (2017)

  16. [16]

    S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, IEEE Micro31, 7 (2011)

  17. [17]

    Pandey, M

    M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov, The transformational role of gpu computing and deep learning in drug discovery (2022)

  18. [18]

    Ankit, I

    A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Roy, and D. S. Milojicic, in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19 (Association for Computing Machinery, New York, NY, USA,

  19. [19]

    P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, and H. Qian, Nature577, 641 (2020)

  20. [20]

    D. A. Miller, Journal of Lightwave Technology35, 346 (2017)

  21. [21]

    Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund,et al., Nature Photonics11, 441 (2017)

  22. [22]

    Hamerly, L

    R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, Physical Review X9, 021032 (2019)

  23. [23]

    X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti,et al., Nature 589, 44 (2021)

  24. [24]

    Wang, S.-Y

    T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. C. Richard, and P. L. McMahon, Nature Communications13, 1 (2022)

  25. [25]

    Hamerly, A

    R. Hamerly, A. Sludds, S. Bandyopadhyay, L. Bernstein, Z. Chen, M. Ghobadi, and D. Englund, inEmerging Topics in Artificial Intelligence (ETAI) 2021, Vol. 11804 (International Society for Optics and Photonics, 2021) p. 118041R

  26. [26]

    H. Zhu, J. Zou, H. Zhang, Y. Shi, S. Luo, N. Wang, H. Cai, L. Wan, B. Wang, X. Jiang,et al., Nature Communications13, 1 (2022)

  27. [27]

    Feldmann, N

    J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, et al., Nature589, 52 (2021)

  28. [28]

    Feldmann, N

    J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, Nature569, 208 (2019)

  29. [29]

    N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, Applied optics24, 1469 (1985)

  30. [30]

    Ashtiani, A

    F. Ashtiani, A. J. Geers, and F. Aflatouni, Nature606, 501 (2022)

  31. [31]

    Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen, G.-B. Jo, J. Liu, and S. Du, Optica6, 1132 (2019)

  32. [32]

    A. N. Tait, T. F. De Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, Scientific reports7, 1 (2017)

  33. [33]

    B. Shi, N. Calabretta, and R. Stabile, IEEE Journal of Selected Topics in Quantum Electronics26, 1 (2019)

  34. [34]

    Pappas, T

    C. Pappas, T. Moschos, A. Prapas, M. Kirtas, M. Moralis-Pegios, A. Tsakyridis, O. Asimopoulos, N. Passalis, A. Tefas, and N. Pleros, Journal of Lightwave Technology (2025)

  35. [35]

    A. N. Tait, T. F. De Lima, M. A. Nahmias, H. B. Miller, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, Physical Review Applied 11, 064043 (2019)

  36. [36]

    Huang, A

    C. Huang, A. Jha, T. F. De Lima, A. N. Tait, B. J. Shastri, and P. R. Prucnal, IEEE Journal of Selected Topics in Quantum Electronics 27, 1 (2020)

  37. [37]

    B. Dong, S. Aggarwal, W. Zhou, U. E. Ali, N. Farmakidis, J. S. Lee, Y. He, X. Li, D.-L. Kwong, C. Wright,et al., Nature Photonics 17, 1080 (2023)

  38. [38]

    Sludds, S

    A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, J. Cochrane, L. Bernstein, D. Bunandar, P. B. Dixon, S. A. Hamilton, M. Streshinsky, A. Novack, T. Baehr-Jones, M. Hochberg, M. Ghobadi, R. Hamerly, and D. Englund, Science378, 270 (2022)

  39. [39]

    Hamerly, A

    R. Hamerly, A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, L. Bernstein, and D. Englund, Journal of Lightwave Technology 42, 7795 (2024)

  40. [40]

    Z. Chen, A. Sludds, R. Davis III, I. Christen, L. Bernstein, L. Ateshian, T. Heuser, N. Heermeier, J. A. Lott, S. Reitzenstein, et al., Nature Photonics17, 723 (2023)

  41. [41]

    N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers,et al., in Proceedings of the 44th annual international symposium on computer architecture(2017) pp. 1–12

  42. [42]

    S. Xu, J. Wang, S. Yi, and W. Zou, Nature communications13, 7970 (2022)

  43. [43]

    Novikov, D

    A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, Advances in neural information processing systems28 (2015)

  44. [44]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., Advances in neural information processing systems32 (2019)

  45. [45]

    Sze, Y.-H

    V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, Proceedings of the IEEE105, 2295 (2017)

  46. [46]

    Sze, Y.-H

    V. Sze, Y.-H. Chen, J. Emer, A. Suleiman, and Z. Zhang, in2017 IEEE Custom Integrated Circuits Conference (CICC)(2017) pp. 1–8

  47. [47]

    Moazeni, S

    S. Moazeni, S. Lin, M. Wade, L. Alloatti, R. J. Ram, M. Popović, and V. Stojanović, IEEE Journal of Solid-State Circuits 52, 3503 (2017)

  48. [48]

    Dietrich, M

    P.-I. Dietrich, M. Blaicher, I. Reuter, M. Billah, T. Hoose, A. Hofmann, C. Caer, R. Dangel, B. Offrein, U. Troppenz,et al., Nature Photonics12, 241 (2018)

  49. [49]

    Aalto, M

    T. Aalto, M. Cherchi, M. Harjanne, S. Bhat, P. Heimala, F. Sun, M. Kapulainen, T. Hassinen, and T. Vehmas, IEEE Journal of selected topics in quantum electronics25, 1 (2019)