arxiv: 2604.10861 · v1 · submitted 2026-04-12 · 🪐 quant-ph · cond-mat.dis-nn· cs.ET· cs.LG

Recognition: unknown

Training single-electron and single-photon stochastic physical neural networks

Tong Dou , Shiro Kumara , Josh Burns , Ethan Sigler , Parth Girdhar , David Petty , Gerard Milburn , Jo Plested

show 1 more author

Matt Woolley

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:58 UTC · model grok-4.3

classification 🪐 quant-ph cond-mat.dis-nncs.ETcs.LG

keywords physical neural networksstochastic neuronssingle-electron tunnelingsingle-photon beam splitterMNIST classificationquantum dotnoise tolerance

0 comments

The pith

Stochastic physical neural networks using single-electron tunneling and single-photon beam splitters achieve over 97% test accuracy on MNIST with few trials per layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces electronic and photonic realizations of stochastic neurons for physical neural networks, where the electronic version relies on single-electron tunneling through a quantum dot and the photonic version uses a single-photon source with a controllable beam-splitter interaction. Training proceeds on models of these neurons for single-hidden-layer networks performing MNIST handwritten digit classification, with experiments varying the number of trials to control stochasticity in the forward pass and comparing true probabilities against empirical outputs in the backward pass. A sympathetic reader would care because the results indicate that physical stochasticity can be embraced rather than suppressed, allowing high accuracy to persist even when noise and model uncertainty are large.

Core claim

Single-hidden-layer stochastic PNNs built from single-electron tunneling neurons or single-photon detector neurons reach more than 97% test accuracy on MNIST when empirical outputs are used in the backward pass, even with few trials per layer and a high degree of noise and model uncertainty.

What carries the argument

The stochastic neuron whose output is the charge state of a quantum dot (electronic) or the occupation of the undriven mode after a controllable beam-splitter interaction (photonic), with training performed on forward models of these physical processes.

If this is right

Using empirical outputs rather than true probabilities in the backward pass enables high accuracy with limited trials per layer.
Performance remains robust across a wide range of noise strengths and model uncertainties.
A single-hidden-layer architecture is sufficient to reach over 97% MNIST accuracy under these stochastic conditions.
Both electronic quantum-dot and photonic beam-splitter realizations can be trained successfully with the same protocol.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may reduce energy costs in inference by performing computation directly in noisy physical hardware instead of simulating it digitally.
Deeper networks could be explored by stacking these stochastic layers while retaining the empirical backward-pass strategy.
Hardware calibration might be simplified if the training already tolerates large mismatches between model and device.

Load-bearing premise

Computer models of the single-electron tunneling and single-photon beam-splitter dynamics accurately capture the real physical noise and switching behavior so that simulation results transfer to hardware.

What would settle it

Fabricated single-electron or single-photon hardware implementations of the same network architecture that fail to reach test accuracies comparable to the simulated results when trained under the same empirical-output protocol.

Figures

Figures reproduced from arXiv: 2604.10861 by David Petty, Ethan Sigler, Gerard Milburn, Jo Plested, Josh Burns, Matt Woolley, Parth Girdhar, Shiro Kumara, Tong Dou.

**Figure 2.** Figure 2: FIG. 2. Schematic of a SET stochastic neuron realized in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Schematic of a TSP stochastic neuron realized in an [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Benchmarking stochastic physics-aware training on [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Performance of the EG estimator under finite sam [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Performance of ST estimators under finite sampling. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Output-layer performance under infinite and few trial [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Comparison of test accuracy using softmax and un [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

The computational demands of deep learning motivate the investigation of alternative approaches to computation. One alternative is physical neural networks~(PNNs), in which learning and inference are performed directly via physical processes. Stochastic PNNs arise when the underlying neurons are realized by the dynamics of a stochastic activation switch. Here we propose novel electronic and photonic stochastic neurons. The electronic realization is implemented by single-electron tunneling through a quantum dot. The photonic realization is implemented via a single-photon source driving one of two modes coupled via a controllable beam-splitter-like interaction. In the electronic case, the charge state of the quantum dot forms the basis for the stochastic neuron, whereas in the photonic case the occupation of the undriven mode serves as the basis for the stochastic neuron. Training of stochastic PNNs is performed with models of stochastic neurons, as well as with coherently-driven, single-photon detector stochastic neurons previously introduced. Several training strategies for MNIST handwritten digit classification have been investigated using single-hidden-layer stochastic PNNs, including varying the number of trials in each layer to control forward pass stochasticity and employing either true probability or empirical outputs in the backward pass to evaluate their influence on gradient estimation. We show that when empirical outputs are used in the backward pass, the network achieves more than 97\% test accuracy with few trials per layer. Despite the simplicity of the model architecture, high test accuracy is maintained in the presence of a high degree of noise and model uncertainty. The results demonstrate the potential of embracing stochastic PNNs for deep learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simulations here show that single-electron tunneling and single-photon beam-splitter neurons can be trained to over 97% MNIST accuracy using empirical outputs in the backward pass, even with few trials and high noise.

read the letter

The core result is that these two new stochastic neuron models—single-electron tunneling through a quantum dot for the electronic version and a single-photon source with controllable beam-splitter coupling for the photonic one—support effective training of single-hidden-layer networks on MNIST. When the backward pass uses empirical outputs rather than true probabilities, accuracy stays above 97% despite limited samples per layer and substantial simulated noise and model uncertainty. The authors also compare different numbers of trials to control forward-pass stochasticity and show the empirical-gradient approach holds up better than one might expect from the noise levels alone. That comparison is the most useful part of the work; it gives a practical handle on how to train these physical stochastic units without needing perfect knowledge of the underlying probabilities. The models themselves are clearly defined around the charge state in the dot and the occupation of the undriven photonic mode, which makes the simulations reproducible in principle. The paper stays within its scope and does not overclaim hardware results. All the reported numbers come from simulation, so the transfer to real devices remains untested and will depend on how well the noise and uncertainty models match actual hardware. The architecture is deliberately simple, which keeps the focus on the neuron realizations but leaves open questions about scaling. This is useful reading for anyone working on physical or unconventional neural networks who wants concrete examples of how to handle stochasticity during training. It deserves peer review because the simulation evidence is specific enough to let referees check the neuron models and the training variants, even if the next step will be hardware validation.

Referee Report

2 major / 2 minor

Summary. The paper proposes two novel stochastic neuron models for physical neural networks: one based on single-electron tunneling through a quantum dot and another using a single-photon source with a controllable beam-splitter interaction. It examines training strategies for single-hidden-layer stochastic PNNs on MNIST classification, varying the number of trials per layer and comparing the use of true probabilities versus empirical outputs in the backward pass. The central result is that empirical outputs in the backward pass yield over 97% test accuracy with few trials per layer, even under high simulated noise and model uncertainty.

Significance. If the simulation results are robust, the work demonstrates that stochastic physical implementations can achieve competitive classification performance without requiring low-noise hardware, potentially enabling more energy-efficient neural network realizations. The explicit comparison of backward-pass strategies offers practical guidance for training noisy physical systems. The forward-looking discussion of hardware transfer is appropriately caveated as prospective rather than demonstrated.

major comments (2)

[Abstract and results] Abstract and results section: The claim of >97% test accuracy with few trials per layer is presented without any reported simulation parameters (e.g., learning rate, network width, number of epochs), number of independent trials, error bars, baseline comparisons to standard neural networks or prior PNNs, or statistical significance tests. This absence makes it impossible to assess whether the performance is robust or sensitive to implementation details.
[Methods and results] Methods and results: No physical hardware validation or even parameter extraction from real devices is provided to support transferability of the simulated neuron models (single-electron tunneling rates or beam-splitter stochasticity) to actual hardware; the high-accuracy claim therefore rests entirely on idealized simulations whose fidelity to experiment is untested.

minor comments (2)

[Neuron models] The description of the photonic neuron model would benefit from an explicit equation relating the beam-splitter reflectivity to the occupation probability of the undriven mode.
[Figures] Figure captions should include the exact number of trials per layer and the noise/model-uncertainty levels used in each panel to allow direct comparison with the text claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We respond to each major comment below and indicate where revisions will be made to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract and results] Abstract and results section: The claim of >97% test accuracy with few trials per layer is presented without any reported simulation parameters (e.g., learning rate, network width, number of epochs), number of independent trials, error bars, baseline comparisons to standard neural networks or prior PNNs, or statistical significance tests. This absence makes it impossible to assess whether the performance is robust or sensitive to implementation details.

Authors: We agree that additional details are needed for full assessment of robustness. The Methods section contains the network architecture (single hidden layer), hyperparameters, trial counts, and averaging procedure, but these were not summarized in the Abstract or highlighted with error bars and baselines in Results. In the revised manuscript we will expand the Abstract to list key parameters (learning rate, hidden-layer width, epochs, trials per layer), add explicit baseline comparisons to standard neural networks and prior PNNs in the Results section, report standard deviations across independent runs, and include statistical significance where appropriate. revision: yes
Referee: [Methods and results] Methods and results: No physical hardware validation or even parameter extraction from real devices is provided to support transferability of the simulated neuron models (single-electron tunneling rates or beam-splitter stochasticity) to actual hardware; the high-accuracy claim therefore rests entirely on idealized simulations whose fidelity to experiment is untested.

Authors: The work is a simulation study; we do not claim hardware validation and already describe hardware transfer as prospective. To strengthen the link to experiment we will revise the Methods to cite specific literature values for quantum-dot tunneling rates and photonic beam-splitter parameters, and expand the discussion of how model uncertainty and noise are incorporated to emulate experimental variability. This clarifies the simulation fidelity without overstating current results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; results are explicit simulation outcomes

full rationale

The paper defines explicit physical models for single-electron tunneling and single-photon beam-splitter neurons, simulates their stochastic dynamics, and reports MNIST classification accuracies obtained by training single-hidden-layer networks with those models (comparing true-probability vs. empirical backward passes, varying trial counts, and injecting noise). No equation or claim reduces to its own inputs by construction, no parameter is fitted and then relabeled as a prediction, and no load-bearing premise rests solely on a self-citation whose validity is presupposed. The central evidence consists of independent simulation runs whose performance metrics are not tautological with the model definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters or invented entities; the work implicitly relies on standard quantum mechanical models of tunneling and photon statistics without introducing new postulates beyond the neuron designs themselves.

axioms (1)

domain assumption Models of single-electron tunneling and single-photon beam-splitter dynamics accurately represent physical stochastic behavior for training purposes.
Invoked to justify using simulations for network training and performance claims.

pith-pipeline@v0.9.0 · 5607 in / 1364 out tokens · 67041 ms · 2026-05-10T14:58:33.499370+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 4 canonical work pages · 2 internal anchors

[1]

LetI⊆Rbe an interval and letp:I→Rbe a continuously differentiable function

Necessary and sufficient condition for autonomous representation We now state a simple characterization of when such an autonomous representation exists. LetI⊆Rbe an interval and letp:I→Rbe a continuously differentiable function. There exists a functiong:p(I)→Rsuch that the derivative ofpcan be expressed as a function ofp itself, i.e., p′(z) =g(p(z)),∀z∈I...
[2]

This monotonicity guarantees thatp(z) is a one-to-one mapping

Strict monotonicity as a sufficient condition For PSNs whose activation probabilityp(z) is a strict monotonic function of the pre-activationz, con- dition (B3) is automatically satisfied. This monotonicity guarantees thatp(z) is a one-to-one mapping. In this scenario, any given probabilityy=p(z) corre- sponds to a unique pre-activationz, allowing us to de...
[3]

Letz= (z 1, z2,

EG estimator for softmax activation Here we show that the EG estimator can be extended to the softmax activation (i.e., its Jacobian can be ex- pressed solely in terms of the softmax outputs), and de- rive the corresponding form. Letz= (z 1, z2, . . . , zn)⊤ ∈R n be the input vector to the softmax function, and letp= (p 1, p2, . . . , pn)⊤ be the output v...
[4]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, na- ture521, 436 (2015)

2015
[5]

Schmidhuber, Deep learning in neural networks: An overview, Neural Networks61, 85 (2015)

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks61, 85 (2015)

2015
[6]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville,Deep Learn- ing(MIT Press, 2016)

2016
[7]

Carbon Emissions and Large Neural Network Training

D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean, Carbon emissions and large neural network training, arXiv:2104.10350 (2021)

work page internal anchor Pith review arXiv 2021
[8]

Wetzstein, A

G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljaˇ ci´ c, C. Denz, D. A. Miller, and D. Psaltis, Infer- ence in artificial intelligence with deep optics and pho- tonics, Nature588, 39 (2020)

2020
[9]

L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. Schachter, Z. Hu, and P. L. McMahon, Deep physical neural networks trained with backpropagation, Nature 601, 549 (2022)

2022
[10]

Momeni, B

A. Momeni, B. Rahmani, M. Mall´ ejac, P. Del Hougne, and R. Fleury, Backpropagation-free training of deep physical neural networks, Science382, 1297 (2023)

2023
[11]

K. P. Kalinin, J. Gladrow, J. Chu, J. H. Clegg, D. Cletheroe, D. J. Kelly, B. Rahmani, G. Brennan, B. Canakci, F. Falck,et al., Analog optical computer for ai inference and combinatorial optimization, Nature 645, 354 (2025)

2025
[12]

Momeni, B

A. Momeni, B. Rahmani, B. Scellier, L. G. Wright, P. L. McMahon, C. C. Wanjura, Y. Li, A. Skalli, N. G. Berloff, T. Onodera,et al., Training of physical neural networks, Nature645, 53 (2025)

2025
[13]

Jacob, S

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, Quantiza- tion and training of neural networks for efficient integer- arithmetic-only inference, inProceedings of the IEEE conference on computer vision and pattern recognition (2018) pp. 2704–2713

2018
[14]

Wang, S.-Y

T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. C. Richard, and P. L. McMahon, An optical neural network using less than 1 photon per multiplication, Nature Com- munications13, 123 (2022)

2022
[15]

Zheng, Z

Z. Zheng, Z. Duan, H. Chen, R. Yang, S. Gao, H. Zhang, H. Xiong, and X. Lin, Dual adaptive training of pho- tonic neural networks, Nature Machine Intelligence5, 1119 (2023)

2023
[16]

Y. Wang, M. Chen, C. Yao, J. Ma, T. Yan, R. Penty, and Q. Cheng, Asymmetrical estimator for training encapsu- lated deep photonic neural networks, Nature Communi- cations16, 2143 (2025)

2025
[17]

T. Xu, Z. Luo, S. Liu, L. Fan, Q. Xiao, B. Wang, D. Wang, and C. Huang, Physical neural networks using sharpness-aware training, Nature Communications17, 1766 (2026)

2026
[18]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y. Bengio, N. L´ eonard, and A. Courville, Estimating or propagating gradients through stochastic neurons for conditional computation, arXiv:1308.3432 (2013)

work page internal anchor Pith review arXiv 2013
[19]

Hubara, M

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, Binarized neural networks, Advances in neural information processing systems29(2016)

2016
[20]

G. J. Milburn and S. Basiri-Esfahani, The physics of learning machines, Contemporary Physics63, 34 (2022)

2022
[21]

S.-Y. Ma, T. Wang, J. Laydevant, L. G. Wright, and P. L. McMahon, Quantum-limited stochastic optical neu- ral networks operating at a few quanta per activation, Nature Communications16, 359 (2025)

2025
[22]

D. F. Walls and G. J. Milburn,Quantum Optics, 3rd ed. (Springer, Berlin, 2025)

2025
[23]

D. K. Ferry, S. M. Goodnick, and J. Bird,Transport in Nanostructures(Cambridge University Press, 2009)

2009
[24]

Pauka and G

S. Pauka and G. Gardener, Autonomous tuning and charge-state detection of gate-defined quantum dots, Physical Review Applied13, 054005 (2020)

2020
[25]

Loredo, L

J. Loredo, L. Stefan, B. Krogh, R. Jensen, I. Suleiman, S. Kr¨ uger, M. Bergamin, H. Thyrrestrup, S. Budtz, J. Roulund,et al., Deterministic quantum dot single- photon sources: Operational principles and state-of-the- art specifications, Applied Physics Reviews13(2026)

2026
[26]

E. N. Knall, C. M. Knaut, R. Bekenstein, D. R. Assump- cao, P. L. Stroganov, W. Gong, Y. Q. Huan, P.-J. Stas, B. Machielse, M. Chalupnik, D. Levonian, A. Suleyman- zade, R. Riedinger, H. Park, M. Lonˇ car, M. K. Bhaskar, and M. D. Lukin, Efficient source of shaped single pho- tons based on an integrated diamond nanophotonic sys- tem, Phys. Rev. Lett.129, ...

2022
[27]

D. D. B¨ uhler, M. Weiß, A. Crespo-Poveda, E. D. Nysten, J. J. Finley, K. M¨ uller, P. V. Santos, M. M. de Lima Jr, and H. J. Krenner, On-chip generation and dynamic piezo-optomechanical rotation of single photons, Nature Communications13, 6998 (2022)

2022
[28]

Alexander, A

K. Alexander, A. Benyamini, D. Black, D. Bonneau, S. Burgos, B. Burridge, H. Cable, G. Campbell, G. Cata- lano, A. Ceballos,et al., A manufacturable platform for photonic quantum computing, Nature641, 876 (2025)

2025
[29]

Y. Lu, A. Maiti, J. W. Garmon, S. Ganjam, Y. Zhang, J. Claes, L. Frunzio, S. M. Girvin, and R. J. Schoelkopf, High-fidelity parametric beamsplitting with a parity- protected converter, nature communications14, 5767 (2023)

2023
[30]

Sonar, U

S. Sonar, U. Hatipoglu, S. Meesala, D. P. Lake, H. Ren, and O. Painter, High-efficiency low-noise optomechanical crystal photon-phonon transducers, Optica12, 99 (2025)

2025
[31]

Zivari, N

A. Zivari, N. Fiaschi, L. Scarpelli, M. Jansen, R. Burg- wal, E. Verhagen, and S. Gr¨ oblacher, A single-phonon directional coupler, Optica Quantum3, 445 (2025)

2025
[32]

Loudon,The Quantum Theory of Light, 3rd ed

R. Loudon,The Quantum Theory of Light, 3rd ed. (Ox- ford University Press, Oxford, 2000)

2000
[33]

J. E. Gough, M. R. James, H. I. Nurdin, and J. Combes, Quantum filtering for systems driven by fields in single- photon states or superposition of coherent states, Physi- cal Review A86, 043819 (2012)

2012
[34]

Combes, J

J. Combes, J. Kerckhoff, and M. Sarovar, The SLH framework for modeling quantum input-output networks, Advances in Physics: X2, 784–888 (2017)

2017
[35]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recogni- tion, Proceedings of the IEEE86, 2278 (1998)

1998
[36]

P. Yin, J. Lyu, S. Zhang, S. Osher, Y. Qi, and J. Xin, Understanding straight-through estimator in training ac- tivation quantized neural nets, arXiv:1903.05662 (2019)

work page arXiv 1903
[37]

X.-M. Wu, D. Zheng, Z. Liu, and W.-S. Zheng, Estima- tor meets equilibrium perspective: A rectified straight through estimator for binary neural networks training, inProceedings of the IEEE/CVF International Confer- 15 ence on Computer Vision(2023) pp. 17055–17064

2023
[38]

Rastegari, V

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, XNOR-Net: Imagenet classification using binary convo- lutional neural networks, inEuropean conference on com- puter vision(Springer, 2016) pp. 525–542

2016
[39]

S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv:1606.06160 (2016)

work page Pith review arXiv 2016
[40]

Szegedy, V

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wo- jna, Rethinking the inception architecture for computer vision, inProceedings of the IEEE conference on com- puter vision and pattern recognition(2016) pp. 2818– 2826

2016