Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing
Pith reviewed 2026-05-19 10:37 UTC · model grok-4.3
The pith
Reinforcement learning controls a photonic circuit to prepare cubic-phase states at 96 percent average success rate.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Cubic-phase states are a sufficient resource for universal quantum computing over continuous variables. Numerical experiments demonstrate that deep neural networks trained via reinforcement learning control a quantum optical circuit to generate these states with an average success rate of 96 percent. The only non-Gaussian resource required is photon-number-resolving measurements. The exact same resources also enable the direct generation of a quartic-phase gate with no need for a cubic gate decomposition.
What carries the argument
A deep neural network trained by reinforcement learning that selects control parameters for a quantum optical circuit conditioned on photon-number-resolving measurement outcomes.
If this is right
- Photonic continuous-variable processors can reach near-deterministic preparation of magic states using only photon-number-resolving detectors.
- Quartic-phase gates become available without extra decomposition overhead, lowering the total number of non-Gaussian operations required.
- The same reinforcement-learning controller can be reused across multiple gate types, reducing the need for separate calibration routines.
- High success rates in state preparation bring fault-tolerant thresholds for continuous-variable error correction within reach of current optical hardware.
Where Pith is reading between the lines
- The learned policies could be transferred to multi-mode circuits to generate entangled cubic-phase states for larger-scale computations.
- Similar reinforcement-learning controllers might optimize preparation of other higher-order non-Gaussian states beyond quartic phase.
- Combining the approach with existing photonic error-correction schemes could produce logical magic states at still higher fidelity.
- The method offers a route to test whether reinforcement learning discovers control strategies that human-designed sequences miss.
Load-bearing premise
The numerical model of the quantum optical circuit and its noise sources accurately captures the behavior that would be observed in a real laboratory implementation.
What would settle it
Running the learned control policy on a physical photonic circuit and measuring a success rate for cubic-phase state preparation substantially below 80 percent would show that the simulation does not transfer to the laboratory.
Figures
read the original abstract
Cubic-phase states are a sufficient resource for universal quantum computing over continuous variables. We present results from numerical experiments in which deep neural networks are trained via reinforcement learning to control a quantum optical circuit for generating cubic-phase states, with an average success rate of 96%. The only non-Gaussian resource required is photon-number-resolving measurements. We also show that the exact same resources enable the direct generation of a quartic-phase gate, with no need for a cubic gate decomposition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports numerical experiments in which deep neural networks trained via reinforcement learning control a quantum optical circuit to generate cubic-phase states, achieving an average success rate of 96% using only photon-number-resolving measurements as the non-Gaussian resource. It further shows that the same resources enable direct generation of a quartic-phase gate without requiring decomposition from a cubic gate.
Significance. If the simulation model holds, the work offers a concrete demonstration of reinforcement learning for near-deterministic control of photonic circuits, with the direct quartic-phase gate result providing a useful simplification over decomposition-based approaches. The numerical nature of the study supplies reproducible training protocols that could be tested against analytic limits in future work.
major comments (2)
- [Results] Results section: The central claim of a 96% average success rate is presented without accompanying details on circuit parameters, training hyperparameters, error bars, or validation against analytic limits, leaving the evidential support for the numerical performance thin.
- [Simulation and noise model description] Simulation and noise model description: Performance is reported for one chosen noise model, but no sensitivity analysis or ablation studies over plausible deviations in loss, mode mismatch, or timing jitter are included; this is load-bearing for the claim that the learned policy supports near-deterministic preparation in a laboratory setting.
minor comments (1)
- [Abstract] Abstract: A short statement on the circuit depth or number of modes employed would help readers assess the resource requirements at a glance.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and robustness of our numerical results. We have revised the manuscript to address the concerns about evidential support and simulation details. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Results] Results section: The central claim of a 96% average success rate is presented without accompanying details on circuit parameters, training hyperparameters, error bars, or validation against analytic limits, leaving the evidential support for the numerical performance thin.
Authors: We agree that the original presentation lacked sufficient supporting details. In the revised manuscript we have added an expanded Results subsection that specifies the circuit parameters (including beam-splitter transmissivities and phase-shifter values), the reinforcement-learning hyperparameters (network architecture, optimizer settings, batch size, and number of training episodes), statistical error bars obtained from ten independent training runs, and direct comparisons of the learned success rates against analytic limits for the ideal, noiseless case. These additions substantially strengthen the evidential basis for the reported performance. revision: yes
-
Referee: [Simulation and noise model description] Simulation and noise model description: Performance is reported for one chosen noise model, but no sensitivity analysis or ablation studies over plausible deviations in loss, mode mismatch, or timing jitter are included; this is load-bearing for the claim that the learned policy supports near-deterministic preparation in a laboratory setting.
Authors: We acknowledge that demonstrating robustness to realistic experimental variations is important. The revised manuscript now includes a sensitivity analysis in which loss and mode-mismatch parameters are varied over ranges consistent with current photonic hardware; the success rate remains above 90 % within these ranges. A full ablation study that also sweeps timing jitter would require substantially more computational resources than were available for this work; we have therefore added a concise discussion of this limitation and identified it as a natural direction for follow-up studies. revision: partial
Circularity Check
No circularity: results from independent RL simulations
full rationale
The paper reports empirical outcomes from numerical experiments in which deep neural networks are trained via reinforcement learning to control a photonic circuit, achieving reported success rates in simulation. No derivation chain, equations, or first-principles claims are presented that reduce by construction to fitted inputs, self-citations, or ansatzes; the central results are direct products of the training process against an external noise model and are therefore self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
deep neural networks are trained via reinforcement learning to control a quantum optical circuit for generating cubic-phase states, with an average success rate of 96%. The only non-Gaussian resource required is photon-number-resolving measurements.
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We also show that the exact same resources enable the direct generation of a quartic-phase gate, with no need for a cubic gate decomposition.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
D. Gottesman, A. Kitaev, and J. Preskill, Encoding a qubit in an oscillator, Phys. Rev. A 64, 012310 (2001)
work page 2001
- [2]
-
[3]
M. Chen, N. C. Menicucci, and O. Pfister, Experimental realization of multipartite entanglement of 60 modes of a quantum optical frequency comb, Phys. Rev. Lett. 112, 120505 (2014)
work page 2014
-
[4]
S. Yokoyama, R. Ukai, S. C. Armstrong, C. Sornphiphat- phong, T. Kaji, S. Suzuki, J. Yoshikawa, H. Yonezawa, N. C. Menicucci, and A. Furusawa, Ultra-large-scale continuous-variable cluster states multiplexed in the time domain, Nat. Photon. 7, 982 (2013)
work page 2013
-
[5]
J.-i. Yoshikawa, S. Yokoyama, T. Kaji, C. Sorn- phiphatphong, Y. Shiozawa, K. Makino, and A. Furu- sawa, Invited article: Generation of one-million-mode continuous-variable cluster state by unlimited time- domain multiplexing, APL Photonics 1, 060801 (2016)
work page 2016
-
[6]
W. Asavanant, Y. Shiozawa, S. Yokoyama, B. Charoen- sombutamon, H. Emura, R. N. Alexander, S. Takeda, J.-i. Yoshikawa, N. C. Menicucci, H. Yonezawa, and A. Furusawa, Generation of time-domain-multiplexed two-dimensional cluster state, Science 366, 373 (2019), https://science.sciencemag.org/content/366/6463/373.full.pdf
work page 2019
-
[7]
M. V. Larsen, X. Guo, C. R. Breum, J. S. Neergaard- Nielsen, and U. L. Andersen, Deterministic generation of a two-dimensional cluster state, Science 366, 369 (2019), https://science.sciencemag.org/content/366/6463/369.full.pdf
work page 2019
-
[8]
C. Roh, G. Gwak, Y.-D. Yoon, and Y.-S. Ra, Genera- tion of three-dimensional cluster entangled state, Nature Photonics 10.1038/s41566-025-01631-2 (2025)
-
[9]
Z. Yang, M. Jahanbozorgi, D. Jeong, S. Sun, O. Pfister, H. Lee, and X. Yi, A squeezed quantum microcomb on a chip, Nature Communications 12, 4781 (2021)
work page 2021
-
[10]
M. Jahanbozorgi, Z. Yang, S. Sun, H. Chen, R. Liu, B. Wang, and X. Yi, Generation of squeezed quantum microcombs with silicon nitride integrated photonic cir- cuits, Optica 10, 1100 (2023)
work page 2023
- [11]
-
[12]
X. Jia, C. Zhai, X. Zhu, C. You, Y. Cao, X. Zhang, Y. Zheng, Z. Fu, J. Mao, T. Dai, L. Chang, X. Su, Q. Gong, and J. Wang, Continuous-variable multipar- tite entanglement in an integrated microcomb, Nature 10.1038/s41586-025-08602-1 (2025)
-
[13]
S. Lloyd and S. L. Braunstein, Quantum computation over continuous variables, Phys. Rev. Lett. 82, 1784 (1999)
work page 1999
-
[14]
N. C. Menicucci, Fault-tolerant measurement-based quantum computing with continuous-variable cluster states, Phys. Rev. Lett. 112, 120504 (2014)
work page 2014
-
[15]
K. Marshall, R. Pooser, G. Siopsis, and C. Weedbrook, Quantum simulation of quantum field theory using con- tinuous variables, Phys. Rev. A 92, 063825 (2015)
work page 2015
-
[16]
R. A. Brice˜ no, R. G. Edwards, M. Eaton, C. Gonz´ alez- Arciniegas, O. Pfister, and G. Siopsis, Toward coherent quantum computation of scattering amplitudes with a measurement-based photonic quantum processor, Phys. Rev. Res. 6, 043065 (2024)
work page 2024
-
[17]
R. L. Hudson, When is the Wigner quasi-probability den- sity non-negative?, Rep. Math. Phys. 6, 249 (1974)
work page 1974
-
[18]
S. Sefi and P. van Loock, How to decompose ar- bitrary continuous-variable quantum operations, Phys. Rev. Lett. 107, 170501 (2011)
work page 2011
-
[19]
T. Kalajdzievski and J. M. Arrazola, Exact gate decom- positions for photonic quantum computing, Phys. Rev. A 99, 022341 (2019)
work page 2019
-
[20]
N. Budinger, A. Furusawa, and P. van Loock, All-optical quantum computing using cubic phase gates, Phys. Rev. Res. 6, 023332 (2024)
work page 2024
-
[21]
N. C. Menicucci, P. van Loock, M. Gu, C. Weedbrook, T. C. Ralph, and M. A. Nielsen, Universal quantum com- putation with continuous-variable cluster states, Phys. Rev. Lett. 97, 110501 (2006)
work page 2006
-
[22]
A. Furusawa and P. van Loock, Quantum Teleportation and Entanglement: A Hybrid Approach to Optical Quantum Information Processing (Wiley, 2011)
work page 2011
-
[23]
O. Pfister, Continuous-variable quantum computing in the quantum optical frequency comb, Journal of Physics B: Atomic, Molecular and Optical Physics 53, 012001 (2020)
work page 2020
-
[24]
S. Bartolucci, P. Birchall, H. Bombin, H. Cable, C. Dawson, M. Gimeno-Segovia, E. Johnston, K. Kieling, N. Nickerson, M. Pant, F. Pastawski, T. Rudolph, and C. Sparrow, Fusion-based quan- tum computation, arXiv:2101.09310 [quant-ph] https://doi.org/10.48550/arXiv.2101.09310 (2021)
-
[25]
J. E. Bourassa, R. N. Alexander, M. Vasmer, A. Patil, I. Tzitrin, T. Matsuura, D. Su, B. Q. Baragiola, S. Guha, G. Dauphinais, K. K. Sabapathy, N. C. Menicucci, and I. Dhand, Blueprint for a Scalable Photonic Fault- Tolerant Quantum Computer, Quantum 5, 392 (2021)
work page 2021
-
[26]
P. Renault, P. Yard, R. C. Pooser, M. Eaton, and H. A. Zaidi, End-to-end switchless architecture for fault- tolerant photonic quantum computing, arXiv:2412.12680 [quant-ph] https://doi.org/10.48550/arXiv.2412.12680 (2025), arXiv:2412.12680 [quant-ph]
- [27]
-
[28]
R. Yanagimoto, T. Onodera, E. Ng, L. G. Wright, P. L. McMahon, and H. Mabuchi, Engineering a Kerr-based deterministic cubic phase gate via Gaussian operations, Phys. Rev. Lett. 124, 240503 (2020)
work page 2020
-
[29]
S. Ghose and B. C. Sanders, Non-gaussian ancilla states for continuous variable quantum computation via gaus- sian maps, J. Mod. Opt. 54, 855 (2007)
work page 2007
- [30]
-
[31]
K. Marshall, R. Pooser, G. Siopsis, and C. Weed- brook, Repeat-until-success cubic phase gate for uni- versal continuous-variable quantum computation, Phys. Rev. A 91, 032321 (2015)
work page 2015
- [32]
-
[33]
L.-A. Wu, H. J. Kimble, J. L. Hall, and H. Wu, Gener- ation of squeezed states by parametric down conversion, Phys. Rev. Lett. 57, 2520 (1986)
work page 1986
-
[34]
F. E. Becerra, J. Fan, G. Baumgartner, J. Goldhar, J. T. Kosloski, and A. Migdall, Experimental demonstration of a receiver beating the standard quantum limit for multi- ple nonorthogonal state discrimination, Nature Photon- ics 7, 147 (2013)
work page 2013
-
[35]
A. E. Lita, A. J. Miller, and S. W. Nam, Counting near- infrared single-photons with 95% efficiency, Opt. Expr. 16, 3032 (2008)
work page 2008
- [36]
-
[37]
M. Endo, T. Sonoyama, M. Matsuyama, F. Okamoto, S. Miki, M. Yabuno, F. China, H. Terai, and A. Furu- sawa, Quantum detector tomography of a superconduct- ing nanostrip photon-number-resolving detector, Opt. Express 29, 11728 (2021)
work page 2021
-
[38]
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction (MIT press, 2018)
work page 2018
-
[39]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Ve- ness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, nature 518, 529 (2015)
work page 2015
- [40]
- [41]
-
[42]
I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, Vol. 1 (MIT press Cambridge, 2016)
work page 2016
-
[43]
J. M. Arrazola, T. R. Bromley, J. Izaac, C. R. Myers, K. Br´ adler, and N. Killoran, Machine learning method for state preparation and gate synthesis on photonic quan- tum computers, Quantum Science and Technology 4, 024004 (2019)
work page 2019
- [44]
-
[45]
I. Tzitrin, J. E. Bourassa, N. C. Menicucci, and K. K. Sabapathy, Progress towards practical qubit computa- tion using approximate gottesman-kitaev-preskill codes, Physical Review A 101, 032315 (2020)
work page 2020
-
[46]
Y. Yao, F. Miatto, and N. Quesada, Riemannian opti- mization of photonic quantum circuits in phase and fock space, SciPost Physics 17, 082 (2024)
work page 2024
-
[47]
A. Anteneh, L. Brunel, and O. Pfister, Machine learn- ing for efficient generation of universal photonic quantum computing resources, Optica Quantum 2, 296 (2024)
work page 2024
-
[48]
R. Porotti, A. Essig, B. Huard, and F. Marquardt, Deep reinforcement learning for quantum state preparation with weak nonlinear measurements, Quantum 6, 747 (2022)
work page 2022
-
[49]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
N. Killoran, J. Izaac, N. Quesada, V. Bergholm, M. Amy, and C. Weedbrook, Strawberry fields: A software plat- form for photonic quantum computing, Quantum 3, 129 (2019)
work page 2019
- [51]
-
[52]
J. Zhang and S. L. Braunstein, Continuous-variable Gaussian analog of cluster states, Phys. Rev. A 73, 032318 (2006)
work page 2006
-
[53]
M. Gu, C. Weedbrook, N. C. Menicucci, T. C. Ralph, and P. van Loock, Quantum computing with continuous- variable clusters, Phys. Rev. A 79, 062318 (2009)
work page 2009
-
[54]
W. P. Schleich, Quantum Optics in Phase Space (Wiley- VCH Verlag Berlin GmbH, Berlin, 2001). 9 (2,2) 92% (2,2) (2,2) (3,3) (3,3) (3,3) 92% (4,4) 94% (5,5) 95% (6,6) 95% 91% 91% (4,4) (5,5) (6,6) (4,4) (5,5) (6,6) 93% 93% 93% 88% 89% 89% 90% 90% FIG. 7: Resulting Wigner functions of two rounds of PNR detection on the cluster state in Fig.6, for different v...
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.