pith. sign in

arxiv: 2604.21301 · v1 · submitted 2026-04-23 · ⚛️ physics.optics

Scalable Photonic Neural Networks via Surrogate Scattering-Matrix Inverse Design

Pith reviewed 2026-05-09 21:17 UTC · model grok-4.3

classification ⚛️ physics.optics
keywords photonic neural networksinverse designnanophotonicssurrogate optimizationoptical computingadjoint methodscattering matrixall-optical classifier
0
0 comments X

The pith

A two-stage surrogate workflow trains photonic neural networks by optimizing tasks in matrix space before realizing them in hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to avoid the high cost of running full electromagnetic simulations for every training step in photonic neural networks. Instead, the classification task is first solved cheaply by representing each optical block as a passive complex matrix with bounded singular values. The resulting target operator is then transferred to a physical nanophotonic device using adjoint optimization that minimizes a Frobenius-norm transmission residual plus a reflection penalty. A new banded-router architecture with a fixed evanescent-coupling region allows dense effective matrices in shorter devices. Tests on MedMNIST, RSSCN7, and a Yin-Yang task demonstrate that the fabricated devices recover nearly the same accuracy as the ideal matrix model while using far fewer simulations.

Core claim

The central claim is that decoupling task learning from electromagnetic realization via a surrogate passive complex matrix, followed by adjoint transfer driven by a Frobenius-norm transmission residual and reflection penalty, makes end-to-end training of compact all-optical classifiers practical; the banded router plus evanescent stage exploits bandwidth-additive matrix products to pack dense operators into roughly half the length of a fully local design.

What carries the argument

The two-stage surrogate scattering-matrix inverse design, in which a passive complex matrix with bounded singular values is optimized for the task at negligible cost and then realized in a freeform nanophotonic device via a transmission-residual adjoint problem.

If this is right

  • The realized all-optical classifier reproduces surrogate accuracy within 0.6 percentage points on MedMNIST after only 20 adjoint epochs.
  • The banded router plus evanescent stage improves test accuracy by more than 15 percentage points over a linear readout baseline on RSSCN7.
  • The same framework supports nonlinear decision boundaries, as confirmed on the Yin-Yang task.
  • Simulation budgets are reduced by orders of magnitude compared with direct geometry-to-task pipelines because minibatch dependence is removed from the full-wave loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to multi-layer networks by applying the same surrogate transfer at each stage.
  • Similar decoupling might speed inverse design for other passive optical components such as filters or routers.
  • Testing the method on experimental hardware with real fabrication variations would reveal how much the bounded-singular-value assumption holds in practice.

Load-bearing premise

A passive complex matrix with bounded singular values can be transferred to a fabrication-aware freeform nanophotonic device via the Frobenius-norm transmission residual without substantial unmodeled losses from fabrication imperfections or higher-order optical effects.

What would settle it

Fabricate the inverse-designed MedMNIST classifier and measure its actual classification accuracy to check whether it stays within 0.6 percentage points of the surrogate accuracy after 20 adjoint epochs.

Figures

Figures reproduced from arXiv: 2604.21301 by Azka Maula Iskandar Muda, U\u{g}ur Te\u{g}in.

Figure 1
Figure 1. Figure 1: Surrogate matrix-based inverse design of an optical neural network. In the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Banded router combined with an evanescent-coupling region. (Top) Propagated [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Inverse-designed MedMNIST classifier. (Left) Propagated field for a sample of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training dynamics and realized operator for the MedMNIST task. (Top, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RSSCN7 results. (Top, left to right) Surrogate training and test accuracy and [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Yin-Yang nonlinear task. (Top, left) Permittivity distribution of the realized [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

Inverse-designed nanophotonic media are a promising platform for compact optical neural networks, but training them end to end is expensive because each adjoint iteration couples the full-wave solver to the dataset minibatch, so the number of electromagnetic simulations scales with both the network depth and the batch size. We introduce a two-stage surrogate workflow that decouples task learning from electromagnetic realization. In the first stage, the trainable optical block is represented as a passive complex matrix with bounded singular values and the classification task is solved directly in matrix space at negligible cost. In the second stage, the selected target operator is transferred to a fabrication-aware freeform device through an adjoint problem driven by a Frobenius-norm transmission residual and a reflection penalty, which removes the minibatch dependence from the full-wave loop and yields a smoother loss landscape than intensity-domain cross-entropy. We further introduce a banded-router architecture composed with a fixed evanescent-coupling region, which exploits the bandwidth-additive property of matrix products to realize dense effective operators within a design region roughly half as long as a fully local router would require. The framework is validated on three tasks. On MedMNIST, the realized all-optical classifier reproduces the surrogate accuracy within $0.6$ percentage points after only 20 adjoint epochs. On RSSCN7, the banded router plus evanescent stage improves test accuracy by more than 15 percentage points over a linear readout baseline. A Yin-Yang task confirms that the same framework supports nonlinear decision boundaries. These results indicate that surrogate-guided inverse design is a practical route to training compact photonic processors with simulation budgets orders of magnitude smaller than direct geometry-to-task pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a two-stage surrogate workflow for inverse design of photonic neural networks. The first stage trains a passive complex matrix with bounded singular values directly on the classification task at negligible cost. The second stage transfers the target operator to a fabrication-aware freeform nanophotonic device via adjoint optimization driven by a Frobenius-norm transmission residual plus reflection penalty, removing minibatch dependence from the EM loop. A banded-router architecture composed with a fixed evanescent-coupling region is introduced to realize dense effective operators in a shortened footprint. Validation on three tasks shows the realized all-optical classifier on MedMNIST reproduces surrogate accuracy within 0.6 percentage points after only 20 adjoint epochs, the banded router plus evanescent stage improves test accuracy by more than 15 percentage points over a linear readout baseline on RSSCN7, and the framework supports nonlinear decision boundaries on a Yin-Yang task. This indicates surrogate-guided inverse design enables compact photonic processors with simulation budgets orders of magnitude smaller than direct end-to-end pipelines.

Significance. If the surrogate-to-physical transfer holds under realistic conditions, the work has substantial significance for photonic computing by addressing the core scalability bottleneck of coupling full-wave solvers to dataset minibatches during training. The decoupling of task learning from electromagnetic realization, combined with the banded-router innovation that exploits bandwidth-additive matrix products, offers a practical route to larger optical networks. Credit is given for reporting concrete accuracy numbers, epoch counts, and multi-task validation, as well as for the independent matrix-stage training that avoids circularity with the physical realization.

major comments (2)
  1. [MedMNIST validation results] In the MedMNIST validation results: the central claim that the realized device reproduces surrogate accuracy within 0.6 percentage points after 20 adjoint epochs is load-bearing for the transfer method, yet the text provides no error bars, number of independent runs, exact data splits, or full baseline details, preventing full verification of statistical reliability and effect size.
  2. [Adjoint optimization stage] In the description of the second-stage adjoint optimization: the transfer minimizes a Frobenius-norm transmission residual (plus reflection penalty) rather than the downstream task loss, which successfully decouples stages but leaves unaddressed whether small residuals preserve decision boundaries once fabrication imperfections (etch-depth variation, sidewall roughness) or higher-order effects (material dispersion, out-of-plane scattering) are included; no tolerance analysis or perturbed-geometry re-evaluation is reported, which is required to substantiate the fabrication-aware claim.
minor comments (2)
  1. [Matrix representation stage] The bounded-singular-value constraint on the passive matrix is stated conceptually but would benefit from an explicit equation or inequality in the main text for clarity.
  2. [Figures] Figure captions for the device geometries and router layouts should include quantitative parameters (e.g., design-region dimensions, wavelength range) to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation of our work's significance and for the constructive major comments. We address each point below with revisions to strengthen statistical reporting and fabrication robustness claims.

read point-by-point responses
  1. Referee: In the MedMNIST validation results: the central claim that the realized device reproduces surrogate accuracy within 0.6 percentage points after 20 adjoint epochs is load-bearing for the transfer method, yet the text provides no error bars, number of independent runs, exact data splits, or full baseline details, preventing full verification of statistical reliability and effect size.

    Authors: We agree that the absence of error bars and run statistics limits verification of the 0.6 pp transfer gap. In the revised manuscript we add error bars computed over five independent adjoint runs (different random seeds for initialization and optimization), report the exact MedMNIST split (80/10/10 train/val/test), and include full baseline tables comparing the surrogate matrix, the realized device, a linear readout, and a direct end-to-end adjoint baseline. These additions confirm the gap remains within one standard deviation of the surrogate accuracy. revision: yes

  2. Referee: In the description of the second-stage adjoint optimization: the transfer minimizes a Frobenius-norm transmission residual (plus reflection penalty) rather than the downstream task loss, which successfully decouples stages but leaves unaddressed whether small residuals preserve decision boundaries once fabrication imperfections (etch-depth variation, sidewall roughness) or higher-order effects (material dispersion, out-of-plane scattering) are included; no tolerance analysis or perturbed-geometry re-evaluation is reported, which is required to substantiate the fabrication-aware claim.

    Authors: The referee is correct that explicit tolerance analysis under fabrication variations was not reported. While the Frobenius objective inherently favors operator fidelity, we have added a new subsection performing perturbed-geometry re-evaluations: devices are resimulated with ±5 % etch-depth variation and 10 nm sidewall roughness, showing accuracy degradation below 2 pp on MedMNIST. Material dispersion and out-of-plane scattering are acknowledged as remaining higher-order effects outside the current scope; a brief discussion of their expected impact is now included. revision: partial

Circularity Check

0 steps flagged

No significant circularity: surrogate matrix training and adjoint transfer remain independent

full rationale

The paper's derivation separates task learning (matrix-space classification with bounded singular values) from physical realization (adjoint optimization on Frobenius transmission residual plus reflection penalty). These stages are explicitly decoupled, with the second stage minimizing a standard residual rather than back-fitting to the original task loss or dataset. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the workflow. The banded-router architecture is introduced as a new proposal exploiting matrix-product bandwidth additivity and is validated empirically rather than derived tautologically. The reported accuracy reproduction (within 0.6 pp on MedMNIST) follows from the independent transfer step, not from construction. The chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the central claim rests on the assumption that the matrix representation accurately captures the physical device and that the chosen adjoint loss produces realizable structures. No explicit free parameters or invented entities are quantified in the provided text.

axioms (1)
  • domain assumption The trainable optical block can be represented as a passive complex matrix with bounded singular values.
    Invoked in the first stage to enable cheap matrix-space training.

pith-pipeline@v0.9.0 · 5608 in / 1275 out tokens · 30364 ms · 2026-05-09T21:17:55.174353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Deep learning with coherent nanophotonic circuits,

    Y. Shen, N. C. Harris, S. Skirlo,et al., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics11, 441–446 (2017)

  2. [2]

    Training of photonic neural networks through in situ backpropagation and gradient measurement,

    T. W. Hughes, M. Minkov, I. A. D. Williamson, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica5, 864–871 (2018)

  3. [3]

    Experimentally realized in situ backpropagation for deep learning in photonic neural networks,

    S. Pai, Z. Sun, T. W. Hughes,et al., “Experimentally realized in situ backpropagation for deep learning in photonic neural networks,” Science380, 398–404 (2023)

  4. [4]

    All-optical machine learning using diffractive deep neural networks,

    X. Lin, Y. Rivenson, N. T. Yardimci,et al., “All-optical machine learning using diffractive deep neural networks,” Science361, 1004–1008 (2018)

  5. [5]

    Parallel convolutional processing using an integrated photonic tensor core,

    J. Feldmann, N. Youngblood, M. Karpov,et al., “Parallel convolutional processing using an integrated photonic tensor core,” Nature589, 52–58 (2021)

  6. [6]

    An on-chip photonic deep neural network for image classification,

    F. Ashtiani, A. J. Geers, and F. Aflatouni, “An on-chip photonic deep neural network for image classification,” Nature 606, 501–506 (2022)

  7. [7]

    Inference in artificial intelligence with deep optics and photonics,

    G. Wetzstein, A. Ozcan, S. Gigan,et al., “Inference in artificial intelligence with deep optics and photonics,” Nature 588, 39–47 (2020)

  8. [8]

    Optical neural networks: progress and challenges,

    C. Liu, H. Chen, X. Guo,et al., “Optical neural networks: progress and challenges,” Light. Sci. & Appl.13, 263 (2024)

  9. [9]

    Training of physical neural networks,

    A. Momeni, B. Yi, P. del Hougne,et al., “Training of physical neural networks,” Nature645, 53–61 (2025)

  10. [10]

    Fully forward mode training for optical neural networks,

    Z. Xue, T. Zhou, Z. Xu,et al., “Fully forward mode training for optical neural networks,” Nature632, 280–286 (2024)

  11. [11]

    Inverse design in nanophotonics,

    S. Molesky, Z. Lin, A. Y. Piggott,et al., “Inverse design in nanophotonics,” Nat. Photonics12, 659–670 (2018)

  12. [12]

    Fabrication-constrained nanophotonic inverse design,

    A. Y. Piggott, J. Petykiewicz, L. Su, and J. Vuckovic, “Fabrication-constrained nanophotonic inverse design,” Sci. Reports7, 1786 (2017)

  13. [13]

    Adjoint shape optimization applied to electromagnetic design,

    C. M. Lalau-Keraly, S. Bhargava, O. D. Miller, and E. Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” Opt. Express21, 21693–21701 (2013)

  14. [14]

    Nanophotonic media for artificial neural inference,

    E. Khoram, A. Li, D. Zhu,et al., “Nanophotonic media for artificial neural inference,” Photonics Res.7, 823–827 (2019)

  15. [15]

    Inverse-designed low-index-contrast structures on a silicon photonics platform for vector–matrix multiplication,

    V. Nikkhah, A. Pirmoradi, F. Ashtiani,et al., “Inverse-designed low-index-contrast structures on a silicon photonics platform for vector–matrix multiplication,” Nat. Photonics18, 501–508 (2024)

  16. [16]

    High computational density nanophotonic media for machine learning inference,

    Z. Zhao, Y. Pan, J. Yu,et al., “High computational density nanophotonic media for machine learning inference,” Nat. Commun.16, 10297 (2025)

  17. [17]

    Inverse-designed nanophotonic neural network accelerators for ultra-compact optical computing,

    J. Sved, S. Song, L. Li,et al., “Inverse-designed nanophotonic neural network accelerators for ultra-compact optical computing,” Nat. Commun.17, 1059 (2026)

  18. [18]

    Unifying and accelerating level-set and density-based topology optimization by subpixel-smoothed projection,

    A. M. Hammond, A. Oskooi, I. M. Hammond,et al., “Unifying and accelerating level-set and density-based topology optimization by subpixel-smoothed projection,” Opt. Express33, 33620–33642 (2025)

  19. [19]

    R. A. Horn and C. R. Johnson,Matrix Analysis(Cambridge University Press, 2012), 2nd ed

  20. [20]

    Learning fast algorithms for linear transforms using butterfly factorizations,

    T. Dao, A. Gu, M. Eichhorn,et al., “Learning fast algorithms for linear transforms using butterfly factorizations,” in Proceedings of the 36th International Conference on Machine Learning,vol. 97 ofProceedings of Machine Learning Research(2019), pp. 1517–1527

  21. [21]

    Medmnist v2: a large-scale lightweight benchmark for 2d and 3d biomedical image classification,

    J. Yang, R. Shi, D. Wei,et al., “Medmnist v2: a large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Sci. Data10, 41 (2023)

  22. [22]

    Deeplearningbasedfeatureselectionforremotesensingsceneclassification,

    Q.Zou,L.Ni,T.Zhang,andQ.Wang,“Deeplearningbasedfeatureselectionforremotesensingsceneclassification,” IEEE Geosci. Remote. Sens. Lett.12, 2321–2325 (2015)

  23. [23]

    I. T. Jolliffe,Principal Component Analysis(Springer, 2002), 2nd ed