Machine learning for efficient generation of universal hybrid quantum computing resources
Pith reviewed 2026-05-24 06:33 UTC · model grok-4.3
The pith
Deep reinforcement learning on a time-multiplexed optical circuit generates squeezed cat states at 98% average success rate.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Numerical simulations of deep reinforcement learning on a measurement-based quantum processor—a time-multiplexed optical circuit sampled by photon-number-resolving detection—generate squeezed cat states with an average success rate of 98%, far outperforming all other similar proposals.
What carries the argument
Deep reinforcement learning algorithm that optimizes measurement sequences on a time-multiplexed optical circuit with photon-number-resolving detection to produce squeezed cat states.
If this is right
- Squeezed cat states can be produced with substantially higher efficiency than by prior methods.
- Measurement-based optical processors become viable for high-yield hybrid quantum resource generation when guided by learned policies.
- Universal quantum computing architectures that rely on these states gain a practical preparation route.
- Similar machine-learning control can be explored for other non-Gaussian optical states.
Where Pith is reading between the lines
- Transferring the learned policies to laboratory hardware would directly test whether the simulated performance survives real noise.
- The same reinforcement learning approach could be applied to optimize state preparation in other time-multiplexed or continuous-variable quantum platforms.
- High success rates may reduce the overhead of error correction or distillation steps needed downstream in hybrid quantum processors.
Load-bearing premise
The numerical model of the time-multiplexed optical circuit and photon-number-resolving detection accurately represents the real physical system without significant unmodeled noise, loss, or detection imperfections that would reduce the actual success rate.
What would settle it
Implementing the reinforcement learning policies on a physical time-multiplexed optical system and measuring whether the success rate for generating squeezed cat states remains near 98 percent or drops due to real imperfections.
Figures
read the original abstract
We present numerical simulations of deep reinforcement learning on a measurement-based quantum processor--a time-multiplexed optical circuit sampled by photon-number-resolving detection--and find it generates squeezed cat states with an average success rate of 98%, far outperforming all other similar proposals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents numerical simulations in which deep reinforcement learning is used to control a measurement-based quantum processor realized as a time-multiplexed optical circuit sampled by photon-number-resolving detectors. The central claim is that this approach generates squeezed cat states with an average success rate of 98 %, substantially outperforming other proposals.
Significance. If the reported success rate remains stable under a more complete physical model, the result would be significant for hybrid quantum resource generation and for the application of reinforcement learning to optical quantum processors. The work supplies a concrete, numerically demonstrated protocol rather than an analytic construction, which is a strength when accompanied by reproducible simulation details.
major comments (2)
- [Abstract and numerical-results section] The 98 % success rate is stated in the abstract and is the headline numerical result, yet no simulation parameters, number of episodes, convergence diagnostics, statistical uncertainties, or baseline comparisons are supplied. Without these, it is impossible to judge whether the figure is robust or sensitive to modeling choices.
- [Methods / environment definition] The numerical model of the time-multiplexed circuit and photon-number-resolving detection is the environment in which the RL agent is trained. No exhaustive error budget or sensitivity analysis is presented for propagation loss, finite squeezing, mode mismatch, timing jitter, or detector dark counts. If any of these channels are under-represented, the learned policy’s success rate inside the simulator will be inflated relative to experiment.
minor comments (2)
- [Abstract] Clarify the precise definition of “squeezed cat state” (amplitude, squeezing level, target fidelity) used for the success metric.
- [Introduction or results] Add a short table or paragraph comparing the 98 % figure to the success rates reported in the “other similar proposals” that are claimed to be outperformed.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to improve the transparency and completeness of the numerical results and error modeling.
read point-by-point responses
-
Referee: [Abstract and numerical-results section] The 98 % success rate is stated in the abstract and is the headline numerical result, yet no simulation parameters, number of episodes, convergence diagnostics, statistical uncertainties, or baseline comparisons are supplied. Without these, it is impossible to judge whether the figure is robust or sensitive to modeling choices.
Authors: We agree that these supporting details are necessary for readers to assess the robustness of the reported success rate. In the revised manuscript we have expanded both the abstract and the numerical-results section to supply the simulation parameters, the number of training episodes, convergence diagnostics, statistical uncertainties on the 98 % figure, and explicit comparisons against baseline policies and prior proposals. These additions are now included in the main text and confirm that the headline result is stable under the conditions examined. revision: yes
-
Referee: [Methods / environment definition] The numerical model of the time-multiplexed circuit and photon-number-resolving detection is the environment in which the RL agent is trained. No exhaustive error budget or sensitivity analysis is presented for propagation loss, finite squeezing, mode mismatch, timing jitter, or detector dark counts. If any of these channels are under-represented, the learned policy’s success rate inside the simulator will be inflated relative to experiment.
Authors: We acknowledge the value of a comprehensive sensitivity analysis. The original model already incorporated the dominant noise mechanisms of the time-multiplexed architecture and PNR detectors. In the revision we have added an explicit error-budget subsection that quantifies the impact of propagation loss, finite squeezing, mode mismatch, timing jitter, and detector dark counts. The analysis demonstrates that the success rate remains above 90 % for realistic levels of these imperfections, indicating that the reported performance is not an artifact of an overly idealized simulator. A fully exhaustive experimental calibration would require physical-device data outside the scope of this numerical study, but the added analysis directly addresses the concern about potential inflation. revision: yes
Circularity Check
No circularity: result is output of numerical simulation, not algebraic reduction
full rationale
The paper reports a success rate obtained by running deep reinforcement learning inside a numerical model of a time-multiplexed optical circuit with photon-number-resolving detection. No derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing steps are present. The 98 % figure is a direct simulation output rather than a quantity forced by construction from the inputs. The model assumptions are external to any algebraic identity and can be falsified by physical experiment.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Chen, N. C. Menicucci, and O. Pfister, “Experimental realization of multipartite entanglement of 60 modes of a quantum optical frequency comb,” Phys. Rev. Lett.112, 120505 (2014)
work page 2014
-
[2]
J.-i. Yoshikawa, S. Yokoyama, T. Kaji,et al., “Invited article: Generation of one-million-mode continuous-variable cluster state by unlimited time-domain multiplexing,” APL Photonics1, 060801 (2016)
work page 2016
-
[3]
Generation of time-domain-multiplexed two-dimensional cluster state,
W. Asavanant, Y. Shiozawa, S. Yokoyama,et al., “Generation of time-domain-multiplexed two-dimensional cluster state,” Science366, 373–376 (2019)
work page 2019
-
[4]
Deterministic generation of a two-dimensional cluster state,
M. V. Larsen, X. Guo, C. R. Breum,et al., “Deterministic generation of a two-dimensional cluster state,” Science 366, 369–372 (2019)
work page 2019
-
[5]
R. Raussendorf and H. J. Briegel, “A one-way quantum computer,” Phys. Rev. Lett.86, 5188 (2001)
work page 2001
-
[6]
Encoding a qubit in an oscillator,
D. Gottesman, A. Kitaev, and J. Preskill, “Encoding a qubit in an oscillator,” Phys. Rev. A64, 012310 (2001)
work page 2001
-
[7]
Fault-tolerant measurement-based quantum computing with continuous-variable cluster states,
N. C. Menicucci, “Fault-tolerant measurement-based quantum computing with continuous-variable cluster states,” Phys. Rev. Lett.112, 120504 (2014)
work page 2014
-
[8]
Efficient classical simulation of continuous variable quantum information processes,
S. D. Bartlett, B. C. Sanders, S. L. Braunstein, and K. Nemoto, “Efficient classical simulation of continuous variable quantum information processes,” Phys. Rev. Lett.88, 097904 (2002)
work page 2002
-
[9]
Continuous-variable quantum computing in the quantum optical frequency comb,
O. Pfister, “Continuous-variable quantum computing in the quantum optical frequency comb,” J. Phys. B: At. Mol. Opt. Phys.53, 012001 (2020)
work page 2020
-
[10]
All-Gaussian universality and fault tolerance with the Gottesman-Kitaev-Preskill code,
B. Q. Baragiola, G. Pantaleoni, R. N. Alexander,et al., “All-Gaussian universality and fault tolerance with the Gottesman-Kitaev-Preskill code,” Phys. Rev. Lett.123, 200502 (2019)
work page 2019
-
[11]
Encoding a qubit in a trapped-ion mechanical oscillator,
C. Flühmann, T. L. Nguyen, M. Marinelli,et al., “Encoding a qubit in a trapped-ion mechanical oscillator,” Nature 566, 513–517 (2019)
work page 2019
-
[12]
Quantum error correction of a qubit encoded in grid states of an oscillator,
P. Campagne-Ibarcq, A. Eickbusch, S. Touzard,et al., “Quantum error correction of a qubit encoded in grid states of an oscillator,” Nature584, 368–372 (2020)
work page 2020
-
[13]
Logicalstatesforfault-tolerantquantumcomputationwithpropagating light,
S.Konno,W.Asavanant,F.Hanamura, etal.,“Logicalstatesforfault-tolerantquantumcomputationwithpropagating light,” Science383, 289–293 (2024)
work page 2024
-
[14]
All-optical generation of states for “Encoding a qubit in an oscillator
H. M. Vasconcelos, L. Sanz, and S. Glancy, “All-optical generation of states for “Encoding a qubit in an oscillator”,” Opt. Lett.35, 3261–3263 (2010)
work page 2010
-
[15]
Generating grid states from schrödinger-cat states without postselection,
D. J. Weigand and B. M. Terhal, “Generating grid states from schrödinger-cat states without postselection,” Phys. Rev. A97, 022341 (2018)
work page 2018
-
[16]
Human-level control through deep reinforcement learning,
V. Mnih, K. Kavukcuoglu, D. Silver,et al., “Human-level control through deep reinforcement learning,” nature518, 529–533 (2015)
work page 2015
-
[17]
S. Borah, B. Sarma, M. Kewming,et al., “Measurement-based feedback quantum control with deep reinforcement learning for a double-well nonlinear potential,” Phys. review letters127, 190403 (2021)
work page 2021
-
[18]
Machine learning method for state preparation and gate synthesis on photonic quantum computers,
J. M. Arrazola, T. R. Bromley, J. Izaac,et al., “Machine learning method for state preparation and gate synthesis on photonic quantum computers,” Quantum Sci. Technol.4, 024004 (2019)
work page 2019
-
[19]
Robust preparation of wigner-negative states with optimized snap-displacement sequences,
M. Kudra, M. Kervinen, I. Strandberg,et al., “Robust preparation of wigner-negative states with optimized snap-displacement sequences,” PRX Quantum3, 030301 (2022)
work page 2022
-
[20]
Progress towards practical qubit computation using approximate gottesman-kitaev-preskill codes,
I. Tzitrin, J. E. Bourassa, N. C. Menicucci, and K. K. Sabapathy, “Progress towards practical qubit computation using approximate gottesman-kitaev-preskill codes,” Phys. Rev. A101, 032315 (2020)
work page 2020
-
[21]
R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction(MIT press, 2018)
work page 2018
-
[22]
Fidelity for mixed quantum states,
R. Jozsa, “Fidelity for mixed quantum states,” J. Mod. Opt.41, 2315 (1994)
work page 1994
-
[23]
Deep reinforcement learning for quantum state preparation with weak nonlinear measurements,
R. Porotti, A. Essig, B. Huard, and F. Marquardt, “Deep reinforcement learning for quantum state preparation with weak nonlinear measurements,” Quantum6, 747 (2022)
work page 2022
-
[24]
Stable-baselines3: Reliable reinforcement learning implementations,
A. Raffin, A. Hill, A. Gleave,et al., “Stable-baselines3: Reliable reinforcement learning implementations,” The J. Mach. Learn. Res.22, 12348–12355 (2021)
work page 2021
-
[25]
Strawberry fields: A software platform for photonic quantum computing,
N. Killoran, J. Izaac, N. Quesada,et al., “Strawberry fields: A software platform for photonic quantum computing,” Quantum 3, 129 (2019)
work page 2019
-
[26]
Conditional production of superpositions of coherent states with inefficient photon detection,
A. P. Lund, H. Jeong, T. C. Ralph, and M. S. Kim, “Conditional production of superpositions of coherent states with inefficient photon detection,” Phys. Rev. A70 (2004)
work page 2004
-
[27]
Experimental generation of squeezed cat states with an operation allowing iterative growth,
J. Etesse, M. Bouillard, B. Kanseri, and R. Tualle-Brouri, “Experimental generation of squeezed cat states with an operation allowing iterative growth,” Phys. Rev. Lett.114, 193602 (2015)
work page 2015
-
[28]
Enlargement of optical Schrödinger’s cat states,
D. V. Sychev, A. E. Ulanov, A. A. Pushkina,et al., “Enlargement of optical Schrödinger’s cat states,” Nat. Photon.11, 379–382 (2017)
work page 2017
-
[29]
M. Eaton, C. González-Arciniegas, R. N. Alexander,et al., “Measurement-based generation and preservation of cat and grid states within a continuous-variable cluster state,” Quantum6, 769 (2022)
work page 2022
-
[30]
Gottesman-kitaev-preskill qubit synthesizer for propagating light,
K. Takase, K. Fukui, A. Kawasaki,et al., “Gottesman-kitaev-preskill qubit synthesizer for propagating light,” npj Quantum Inf.9, 98 (2023)
work page 2023
-
[31]
On the design of photonic quantum circuits,
Y. Yao, F. Miatto, and N. Quesada, “On the design of photonic quantum circuits,” arXiv:2209.06069 (2022)
-
[32]
Deterministic preparation of optical squeezed cat and Gottesman-Kitaev-Preskill states,
M. S. Winnel, J. J. Guanzon, D. Singh, and T. C. Ralph, “Deterministic preparation of optical squeezed cat and Gottesman-Kitaev-Preskill states,” arXiv:2311.10510 (2023)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.