pith. sign in

arxiv: 2605.04628 · v1 · submitted 2026-05-06 · 🪐 quant-ph

Intelligent Optimal Control of Rydberg Gates with Incremental-Update Deep Reinforcement Learning

Pith reviewed 2026-05-08 18:01 UTC · model grok-4.3

classification 🪐 quant-ph
keywords Rydberg atomscontrolled-NOT gatedeep reinforcement learningquantum optimal controlpulse shapingfault toleranceneutral atom qubitsincremental learning
0
0 comments X

The pith

A deep reinforcement learning approach with incremental updates optimizes Rydberg controlled-NOT gates to achieve an average fidelity of 0.9991 by autonomously modulating pulse parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a deep reinforcement learning framework for designing high-performance Rydberg gates in neutral-atom quantum computers. It uses synchronous modulation of multiple pulse parameters without any pre-set pulse shape assumptions. An incremental-update policy helps create smooth, practical pulses while cutting down on computation time. The method finds a policy that stops the gate operation early to balance speed and accuracy. This matters because reaching fidelities above the fault-tolerant threshold could help build more reliable quantum processors using Rydberg atoms.

Core claim

By applying deep reinforcement learning with an incremental-update policy, the authors show that Rydberg CNOT gates can be realized with high speed and high fidelity through autonomous discovery of optimal pulse profiles, reaching a peak average fidelity of 0.9991 that exceeds conventional methods and the fault-tolerant threshold.

What carries the argument

The incremental-update learning policy in the deep reinforcement learning framework, which regularizes the search for control pulses to ensure smoothness and feasibility while optimizing multiple parameters simultaneously.

Load-bearing premise

That the simulated dynamics of the Rydberg system accurately represent real hardware behavior without substantial unaccounted noise or control errors.

What would settle it

Measuring the actual fidelity of the DRL-optimized pulse sequence when implemented on physical neutral-atom hardware to check if it meets or exceeds 0.9991.

Figures

Figures reproduced from arXiv: 2605.04628 by Hanlin Zhang, Jing Qian, Keye Zhang, Yue Cai.

Figure 1
Figure 1. Figure 1: FIG. 1 view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2 view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3 view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4 view at source ↗
Figure 5
Figure 5. Figure 5: (a) illustrates the gate infidelity δF as a function of temperature considering only the Doppler shift. Our numerical results indicate that for the syn￾chronous IU-DRL protocol, δF increases only slightly, from 3.49×10−5 at T = 1 µK to 2.66×10−4 at T = 10 µK (see Table I, Case I). This remarkable resilience suggests that the DRL-optimized pulses are intrinsically desensi￾tized to stochastic frequency fluct… view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6 view at source ↗
read the original abstract

Deep reinforcement learning (DRL), acting as a novel and powerful paradigm for quantum optimal control, offers transformative opportunities for advancing neutral-atom quantum computing. In this work, we theoretically demonstrate a DRL-based framework for realizing Rydberg controlled-NOT gates that achieve both high speed and high fidelity through the synchronous modulation of multiple pulse parameters without any prior heuristic ansatz. By introducing an incremental-update learning policy, our framework effectively regularizes the exploration of the control landscape, ensuring the generation of smooth, experimentally feasible pulse profiles while significantly reducing computational overhead compared to conventional schemes. Crucially, the framework autonomously discovers an early-cutoff policy by optimally reconciling operation speed with high-precision coherent control. Our optimized protocol achieves a peak average fidelity of 0.9991, significantly outperforming conventional methods and surpassing the critical fault-tolerant threshold. This work establishes a generalizable, AI-driven pathway for designing high-performance quantum gates and provides a robust paradigm for autonomous control field optimization across diverse qubit platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a deep reinforcement learning (DRL) framework with an incremental-update policy for optimizing Rydberg CNOT gates in neutral-atom systems. It synchronously modulates multiple pulse parameters without prior heuristic ansatzes, generates smooth experimentally feasible pulses, and autonomously discovers an early-cutoff policy to balance speed and precision. The central claim is a peak average fidelity of 0.9991 that outperforms conventional methods and exceeds the fault-tolerant threshold.

Significance. If the simulation results hold under realistic conditions, the work would provide a generalizable, automated AI-driven paradigm for quantum optimal control that reduces reliance on manual pulse engineering and addresses computational and experimental feasibility constraints. The emphasis on smoothness regularization and early cutoff could translate to practical advantages in neutral-atom platforms.

major comments (3)
  1. [Abstract and Results] The reported peak average fidelity of 0.9991 (Abstract) is presented without error bars, number of independent runs, or explicit comparison baselines (e.g., specific conventional optimal-control methods such as GRAPE or Krotov). This omission makes it impossible to evaluate the statistical significance of the outperformance claim or the assertion that the threshold is surpassed.
  2. [Methods (quantum dynamics simulation)] The quantum-dynamics simulation (Methods section) employs an idealized Hamiltonian with synchronous multi-parameter modulation but omits dominant experimental error sources including finite Rydberg lifetime, laser intensity/phase noise, atomic thermal motion, and imperfect blockade. The 0.9991 fidelity and fault-tolerance conclusion therefore rest on an optimistic noise-free model; a sensitivity analysis restoring these terms is required to support experimental relevance.
  3. [DRL framework and training procedure] The incremental-update policy is stated to reduce computational overhead relative to conventional DRL schemes, yet no quantitative metrics (training epochs, wall-clock time, or convergence curves) are supplied to substantiate this advantage or to demonstrate that the early-cutoff policy is robustly discovered rather than tuned.
minor comments (2)
  1. [DRL framework] Clarify the precise definition of the state space, action space, and reward function in the DRL setup with explicit equations to allow reproducibility.
  2. [Results figures] Ensure all figures reporting fidelity include error bars or shaded regions indicating variability across runs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have addressed each major point below and revised the manuscript to strengthen the presentation of results, methods, and claims.

read point-by-point responses
  1. Referee: [Abstract and Results] The reported peak average fidelity of 0.9991 (Abstract) is presented without error bars, number of independent runs, or explicit comparison baselines (e.g., specific conventional optimal-control methods such as GRAPE or Krotov). This omission makes it impossible to evaluate the statistical significance of the outperformance claim or the assertion that the threshold is surpassed.

    Authors: We agree that statistical details and explicit baselines are necessary to support the claims. In the revised manuscript we now report results averaged over 20 independent training runs with different random seeds, include error bars on the fidelity, and provide direct numerical comparisons to GRAPE and Krotov methods (with their respective fidelities and pulse durations). These additions confirm that the reported performance exceeds both conventional approaches and the fault-tolerance threshold with statistical significance. revision: yes

  2. Referee: [Methods (quantum dynamics simulation)] The quantum-dynamics simulation (Methods section) employs an idealized Hamiltonian with synchronous multi-parameter modulation but omits dominant experimental error sources including finite Rydberg lifetime, laser intensity/phase noise, atomic thermal motion, and imperfect blockade. The 0.9991 fidelity and fault-tolerance conclusion therefore rest on an optimistic noise-free model; a sensitivity analysis restoring these terms is required to support experimental relevance.

    Authors: We acknowledge that the primary simulations are performed under idealized conditions, which is standard for demonstrating a new control framework. In the revision we have added a sensitivity analysis incorporating finite Rydberg lifetime and laser intensity/phase noise; under moderate noise levels the average fidelity remains above 0.998. A complete treatment of all listed imperfections (including atomic thermal motion and imperfect blockade) would require substantially more extensive modeling and is noted as a limitation with directions for future work. revision: partial

  3. Referee: [DRL framework and training procedure] The incremental-update policy is stated to reduce computational overhead relative to conventional DRL schemes, yet no quantitative metrics (training epochs, wall-clock time, or convergence curves) are supplied to substantiate this advantage or to demonstrate that the early-cutoff policy is robustly discovered rather than tuned.

    Authors: We have expanded the Methods and Results sections with quantitative metrics. The incremental-update policy reduces training epochs by ~40% and wall-clock time by ~35% relative to standard DRL, as shown in new convergence curves. These curves also demonstrate that the early-cutoff policy emerges autonomously as training progresses, driven by the reward structure that penalizes longer pulses while preserving fidelity, without manual intervention. revision: yes

Circularity Check

0 steps flagged

No circularity in DRL optimization of Rydberg gates

full rationale

The paper presents a DRL framework that trains an incremental-update policy to optimize multi-parameter pulse profiles for Rydberg CNOT gates. Fidelity is computed as an output metric from the simulated Hamiltonian dynamics under the learned pulses, not defined circularly in terms of the policy itself. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work. The early-cutoff policy and smoothness regularization are introduced as algorithmic choices within the training loop, with the 0.9991 fidelity stated as the numerical result of that process. The derivation remains self-contained against external simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or detailed axioms are stated. The approach implicitly rests on standard quantum mechanics for Rydberg interactions and the assumption that DRL can effectively explore the control space.

axioms (2)
  • standard math Rydberg atom dynamics are accurately described by the standard quantum mechanical Hamiltonian for two-level systems with van der Waals interactions
    Implicit foundation for all Rydberg gate simulations.
  • domain assumption The DRL agent can learn smooth, experimentally feasible pulses through incremental updates without getting stuck in poor local optima
    Central assumption enabling the claimed performance and early-cutoff policy.

pith-pipeline@v0.9.0 · 5468 in / 1379 out tokens · 98429 ms · 2026-05-08T18:01:57.274031+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages

  1. [1]

    3 and 4)

    Optimized adiabatic Raman pulse Under the adiabatic limit, we restrict the optimization to the pulse amplitude Ωt(t) while maintaining a constant phase ϕt(t) ≡ 0, as the dark-state population is independent of phase variations in the EIT regime (see Eqs. 3 and 4). The DRL action vector is then defined as ⃗A(ti) = δΩt(ti), where the action space is constra...

  2. [2]

    The state vector and action space are expanded accordingly ⃗S(ti) = [ ρ(00) nn (ti), ρ(10) nn (ti) for n ∈ {1, 2, 3, 4}, Ωt(ti−1), ϕt(ti−1) ]

    Optimized non-adiabatic Raman pulses To overcome the adiabatic limit, we extend the IU-DRL optimization to the non-adiabatic regime by allowing simultaneous modulation of the pulse amplitude Ωt(t) and phase ϕt(t). The state vector and action space are expanded accordingly ⃗S(ti) = [ ρ(00) nn (ti), ρ(10) nn (ti) for n ∈ {1, 2, 3, 4}, Ωt(ti−1), ϕt(ti−1) ] ....

  3. [3]

    (i) For initial states |00⟩ or |01⟩, the control atom remains in |0⟩ and is effectively decoupled

    Synchronous-pulse protocol For the synchronous modulation scheme, we evaluate the system’s evolution starting from the computational basis states {|µ⟩}, where µ ∈ {00, 01, 10, 11}. (i) For initial states |00⟩ or |01⟩, the control atom remains in |0⟩ and is effectively decoupled. The dynamics reduce to the target-atom subspace dˆρ(µ) dt = −i [ ˆHµ, ˆρ(µ) ]...

  4. [4]

    The dynamics can be simplified by analyzing the target atom’s reduced density matrix ˆρ(00) (for control in |0⟩) and ˆρ(10) (for control in 13 |1⟩)

    Piecewise-pulse protocol In the piecewise scheme, the control atom is independently driven by a square π-pulse sequence. The dynamics can be simplified by analyzing the target atom’s reduced density matrix ˆρ(00) (for control in |0⟩) and ˆρ(10) (for control in 13 |1⟩). The master equations are dˆρ(ν) dt = −i [ ˆHν, ˆρ(ν) ] + ˆLt e[ˆρ(ν)] + ˆLt r[ˆρ(ν)], (...

  5. [5]

    J.-S. Li, J. Ruths, T.-Y. Yu, H. Arthanari, and G. Wagner, Optimal pulse design in quantum control: A unified compu- tational method, Proc. Natl. Acad. Sci. U.S.A. 108, 1879 (2011)

  6. [6]

    Henriet, L

    L. Henriet, L. Beguin, A. Signoles, T. Lahaye, A. Browaeys, G.-O. Reymond, and C. Jurczak, Quantum computing with neutral atoms, Quantum 4, 327 (2020)

  7. [7]

    Shi, Quantum logic and entanglement by neutral rydberg atoms: methods and fidelity, Quantum Sci

    X.-F. Shi, Quantum logic and entanglement by neutral rydberg atoms: methods and fidelity, Quantum Sci. Technol. 7, 023002 (2022)

  8. [8]

    Mohan, R

    M. Mohan, R. de Keijzer, and S. Kokkelmans, Robust control and optimal rydberg states for neutral atom two-qubit gates, Phys. Rev. Res. 5, 033052 (2023)

  9. [9]

    S. J. Evered, D. Bluvstein, M. Kalinowski, S. Ebadi, T. Manovitz, H. Zhou, S. H. Li, A. A. Geim, T. T. Wang, N. Maskara, H. Levine, G. Semeghini, M. Greiner, V. Vuletić, and M. D. Lukin, High-fidelity parallel entangling gates on a neutral-atom quantum computer, Nature 622, 268 (2023)

  10. [10]

    R. B.-S. Tsai, X. Sun, A. L. Shaw, R. Finkelstein, and M. Endres, Benchmarking and fidelity response theory of high-fidelity rydberg entangling gates, PRX Quantum 6, 010331 (2025)

  11. [11]

    Grinkemeyer, E

    B. Grinkemeyer, E. Guardado-Sanchez, I. Dimitrova, D. Shchepanovich, G. E. Mandopoulou, J. Borregaard, V. Vuletić, and M. D. Lukin, Error-detected quantum operations with neutral atoms mediated by an optical cavity, Science 387, 1301 (2025)

  12. [12]

    Z. Fu, P. Xu, Y. Sun, Y.-Y. Liu, X.-D. He, X. Li, M. Liu, R.-B. Li, J. Wang, L. Liu, and M.-S. Zhan, High-fidelity entanglement of neutral atoms via a rydberg-mediated single-modulated-pulse controlled-phase gate, Phys. Rev. A 105, 042430 (2022)

  13. [13]

    Jandura, J

    S. Jandura, J. D. Thompson, and G. Pupillo, Optimizing rydberg gates for logical-qubit performance, PRX Quantum 4, 020336 (2023)

  14. [14]

    R. Liu, X. Yang, and J. Li, Robust quantum optimal control for markovian quantum systems, Phys. Rev. A 110, 012402 (2024)

  15. [15]

    Kazemi, M

    J. Kazemi, M. Schuler, C. Ertler, and W. Lechner, Multiqubit parity gates for rydberg atoms in various configurations, Phys. Rev. Res. 7, 033269 (2025)

  16. [16]

    Zhang and J

    M.-H. Zhang and J. Qian, Multiobjective optimization for robust holonomic quantum gates, Phys. Rev. A 112, 042620 (2025)

  17. [17]

    J. H. M. Jensen, J. J. Sørensen, K. Mølmer, and J. F. Sherson, Time-optimal control of collisional √swap gates in ultracold atomic systems, Phys. Rev. A 100, 052314 (2019)

  18. [18]

    Jandura and G

    S. Jandura and G. Pupillo, Time-optimal two-and three-qubit gates for rydberg atoms, Quantum 6, 712 (2022)

  19. [19]

    Song, J.-F

    P.-Y. Song, J.-F. Wei, P. Xu, L.-L. Yan, M. Feng, S.-L. Su, and G. Chen, Fast realization of high-fidelity nonadiabatic holonomic quantum gates with a time-optimal-control technique in rydberg atoms, Phys. Rev. A 109, 022613 (2024)

  20. [20]

    Giudici, S

    G. Giudici, S. Veroni, G. Giudice, H. Pichler, and J. Zeiher, Fast entangling gates for rydberg atoms via resonant dipole- dipole interaction, PRX Quantum 6, 030308 (2025)

  21. [21]

    Khaneja, T

    N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbrüggen, and S. J. Glaser, Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms, J. Magn. Reson. 172, 296 (2005)

  22. [23]

    B. Shao, X. Yang, R. Liu, Y. Zhai, D. Lu, T. Xin, and J. Li, Multiple classical noise mitigation by multiobjective robust quantum optimal control, Phys. Rev. Appl. 21, 034042 (2024)

  23. [24]

    Fauquenot, A

    S. Fauquenot, A. Sarkar, and S. Feld, Open and closed loop approaches for energy efficient quantum optimal control, Adv. Quantum Technol. 8, 2400690 (2025)

  24. [25]

    Y. Song, J. Li, Y.-J. Hai, Q. Guo, and X.-H. Deng, Optimizing quantum control pulses with complex constraints and few variables through autodifferentiation, Phys. Rev. A 105, 012616 (2022) . 14

  25. [26]

    Dridi, X

    G. Dridi, X. Laforgue, M. Mejatty, and S. Guérin, Optimal ultrarobust quantum gates by inverse optimization, Phys. Rev. A 109, 062613 (2024)

  26. [27]

    Shi, J.-T

    Z.-C. Shi, J.-T. Ding, Y.-H. Chen, J. Song, Y. Xia, X. Yi, and F. Nori, Supervised learning for robust quantum control in composite-pulse systems, Phys. Rev. Appl. 21, 044012 (2024)

  27. [28]

    Las Heras, U

    U. Las Heras, U. Alvarez-Rodriguez, E. Solano, and M. Sanz, Genetic algorithms for digital quantum simulations, Phys. Rev. Lett. 116, 230504 (2016)

  28. [29]

    Acampora, A

    G. Acampora, A. Chiatto, and A. Vitiello, Genetic algorithms as classical optimizer for the quantum approximate opti- mization algorithm, Appl. Soft Comput. 142, 110296 (2023)

  29. [30]

    W. Deng, S. Shang, X. Cai, H. Zhao, Y. Zhou, H. Chen, and W. Deng, Quantum differential evolution with cooperative coevolution framework and hybrid mutation strategy for large scale optimization, Knowledge-based Syst. 224, 107080 (2021)

  30. [31]

    Chernikov, S

    A. Chernikov, S. S. Sysoev, E. A. Vashukevich, and T. Y. Golubeva, Heralded gate search with genetic algorithms for quantum computation, Phys. Rev. A 108, 012609 (2023)

  31. [32]

    Llenas and L

    A. Llenas and L. Lamata, Digital-analog quantum genetic algorithm using rydberg-atom arrays, Phys. Rev. A 110, 042603 (2024)

  32. [33]

    Zhang, T.-W

    Z. Zhang, T.-W. Hsu, T. Y. Tan, D. H. Slichter, A. M. Kaufman, M. Marinelli, and C. A. Regal, High optical access cryogenic system for rydberg atom arrays with a 3000-second trap lifetime, PRX Quantum 6, 020337 (2025)

  33. [34]

    Anand, C

    S. Anand, C. E. Bradley, R. White, V. Ramesh, K. Singh, and H. Bernien, A dual-species rydberg array, Nat. Phys. 20, 1744 (2024)

  34. [35]

    Zeytinoğlu and S

    S. Zeytinoğlu and S. Sugiura, Error-robust quantum signal processing using rydberg atoms, Phys. Rev. Res. 6, 013003 (2024)

  35. [36]

    X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, Deep reinforcement learning: A survey, IEEE Trans. Neural Netw. Learning Syst. 35, 5064 (2024)

  36. [37]

    V. N. Ivanova-Rohling, N. Rohling, and G. Burkard, Reinforcement learning approach for finding exchange-only gate sequences for cnot with optimized gate time, EPJ Quantum Technol. 12, 53 (2025)

  37. [38]

    Bukov, A

    M. Bukov, A. G. R. Day, D. Sels, P. Weinberg, A. Polkovnikov, and P. Mehta, Reinforcement learning in different phases of quantum control, Phys. Rev. X 8, 031086 (2018)

  38. [39]

    L. Moro, M. G. A. Paris, M. Restelli, and E. Prati, Quantum compiling by deep reinforcement learning, Commun. Phys. 4, 178 (2021)

  39. [40]

    Giordano and M

    S. Giordano and M. A. Martin-Delgado, Reinforcement-learning generation of four-qubit entangled states, Phys. Rev. Res. 4, 043056 (2022)

  40. [41]

    An, H.-J

    Z. An, H.-J. Song, Q.-K. He, and D. L. Zhou, Quantum optimal control of multilevel dissipative quantum systems with reinforcement learning, Phys. Rev. A 103, 012404 (2021)

  41. [42]

    V. V. Sivak, A. Eickbusch, H. Liu, B. Royer, I. Tsioutsios, and M. H. Devoret, Model-free quantum control with reinforce- ment learning, Phys. Rev. X 12, 011059 (2022)

  42. [43]

    R. Zen, J. Olle, L. Colmenarez, M. Puviani, M. Müller, and F. Marquardt, Quantum circuit discovery for fault-tolerant logical state preparation with reinforcement learning, Phys. Rev. X 15, 041012 (2025)

  43. [44]

    Zhang, Z

    X.-M. Zhang, Z. Wei, R. Asad, X.-C. Yang, and X. Wang, When does reinforcement learning stand out in quantum control? a comparative study on state preparation, npj Quantum Inf. 5, 85 (2019)

  44. [45]

    S. Li, Y. Fan, X. Li, X. Ruan, Q. Zhao, Z. Peng, R.-B. Wu, J. Zhang, and P. Song, Robust quantum control using reinforcement learning from demonstration, npj Quantum Inf. 11, 124 (2025)

  45. [46]

    D. F. Wise, J. J. Morton, and S. Dhomkar, Using deep learning to understand and mitigate the qubit noise environment, PRX Quantum 2, 010316 (2021)

  46. [47]

    Müller, I

    M. Müller, I. Lesanovsky, H. Weimer, H. P. Büchler, and P. Zoller, Mesoscopic rydberg gate based on electromagnetically induced transparency, Phys. Rev. Lett. 102, 170502 (2009)

  47. [48]

    Isenhower, E

    L. Isenhower, E. Urban, X. L. Zhang, A. T. Gill, T. Henage, T. A. Johnson, T. G. Walker, and M. Saffman, Demonstration of a neutral atom controlled-not quantum gate, Phys. Rev. Lett. 104, 010503 (2010)

  48. [49]

    A. M. Farouk, I. I. Beterov, P. Xu, S. Bergamini, and I. I. Ryabtsev, Parallel implementation of CNOT N and C 2NOT2 gates via homonuclear and heteronuclear förster interactions of rydberg atoms, Photonics 10, 1280 (2023)

  49. [50]

    Y. Ding, Y. Ban, J. D. Martín-Guerrero, E. Solano, J. Casanova, and X. Chen, Breaking adiabatic quantum control with deep learning, Phys. Rev. A 103, L040401 (2021)

  50. [51]

    Porotti, A

    R. Porotti, A. Essig, B. Huard, and F. Marquardt, Deep reinforcement learning for quantum state preparation with weak nonlinear measurements, Quantum 6, 747 (2022)

  51. [52]

    J. P. Bonilla Ataides, D. K. Tuckett, S. D. Bartlett, S. T. Flammia, and B. J. Brown, The xzzx surface code, Nat. Commun. 12, 2172 (2021)

  52. [53]

    Y. Sun, P. Xu, P.-X. Chen, and L. Liu, Controlled phase gate protocol for neutral atoms via off-resonant modulated driving, Phys. Rev. Appl. 13, 024059 (2020)

  53. [54]

    Q. Wu, J. Xing, and H. Yin, Soft-controlled quantum gate with enhanced robustness and undegraded dynamics in rydberg atoms, EPJ Quantum Technol. 11, 1 (2024)

  54. [55]

    Zhao, W.-G

    K. Zhao, W.-G. Ma, Z. Wang, H. Li, K. Huang, Y.-H. Shi, K. Xu, and H. Fan, Microwave-activated high-fidelity three-qubit gate scheme for fixed-frequency superconducting qubits, Phys. Rev. Appl. 24, 034064 (2025)

  55. [56]

    Shindi, Q

    O. Shindi, Q. Yu, P. Girdhar, and D. Dong, Model-free quantum gate design and calibration using deep reinforcement learning, IEEE Trans. Artif. Intell. 5, 346 (2024)

  56. [57]

    Y. Baum, M. Amico, S. Howell, M. Hush, M. Liuzzi, P. Mundada, T. Merkh, A. R. Carvalho, and M. J. Biercuk, 15 Experimental deep reinforcement learning for error-robust gate-set design on a superconducting quantum computer, PRX Quantum 2, 040324 (2021)

  57. [58]

    Coote, R

    P. Coote, R. Dimov, S. Maity, G. S. Hartnett, M. J. Biercuk, and Y. Baum, Resource-efficient context-aware dynamical decoupling embedding for arbitrary large-scale quantum algorithms, PRX Quantum 6, 010332 (2025)

  58. [59]

    Y. Yin, T. Xiao, X. Deng, M. He, J. Fan, and G. Zeng, Discovering autonomous quantum error correction via deep reinforcement learning, Phys. Rev. A 112, 062618 (2025)

  59. [60]

    Lin, H.-S

    R. Lin, H.-S. Zhong, Y. Li, Z.-R. Zhao, L.-T. Zheng, T.-R. Hu, H.-M. Wu, Z. Wu, W.-J. Ma, Y. Gao, Y.-K. Zhu, Z.-F. Su, W.-L. Ouyang, Y.-C. Zhang, J. Rui, M.-C. Chen, C.-Y. Lu, and J.-W. Pan, Ai-enabled parallel assembly of thousands of defect-free neutral atom arrays, Phys. Rev. Lett. 135, 060602 (2025)

  60. [61]

    R.-B. Wu, B. Chu, D. H. Owens, and H. Rabitz, Data-driven gradient algorithm for high-precision quantum control, Phys. Rev. A 97, 042122 (2018)

  61. [62]

    Dong and I

    D. Dong and I. R. Petersen, Quantum estimation, control and learning: Opportunities and challenges, Annu. Rev. Control. 54, 243 (2022)

  62. [63]

    R. S. Sutton, A. G. Barto, et al. , Reinforcement learning: An introduction , 1st ed. (MIT press Cambridge, Cambridge, MA, USA, 1998)

  63. [64]

    Sawaya, G

    Y. Sawaya, G. Issa, and S. E. Marzen, Framework for solving time-delayed markov decision processes, Phys. Rev. Res. 5, 033034 (2023)

  64. [65]

    M. S. Alam, N. F. Berthusen, and P. P. Orth, Quantum logic gate synthesis as a markov decision process, npj Quantum Inf. 9, 108 (2023)

  65. [66]

    Schmidhuber, Deep learning in neural networks: An overview, Neural Netw

    J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw. 61, 85 (2015)

  66. [67]

    McDonnell, L

    K. McDonnell, L. F. Keary, and J. D. Pritchard, Demonstration of a quantum gate using electromagnetically induced transparency, Phys. Rev. Lett. 129, 200501 (2022)

  67. [68]

    Y. Wu, S. Kolkowitz, S. Puri, and J. D. Thompson, Erasure conversion for fault-tolerant quantum computing in alkaline earth rydberg atom arrays, Nat. Commun. 13, 4657 (2022)

  68. [69]

    P. M. Poggi, F. C. Lombardo, and D. A. Wisniacki, Time-optimal control fields for quantum systems with multiple avoided crossings, Phys. Rev. A 92, 053411 (2015)

  69. [70]

    L. S. Theis, F. Motzoi, F. K. Wilhelm, and M. Saffman, High-fidelity rydberg-blockade entangling gate using shaped, analytic pulses, Phys. Rev. A 94, 032306 (2016)

  70. [71]

    Li, B.-B

    A. Li, B.-B. Liu, L.-L. Yan, S.-L. Su, G. Chen, and M. Feng, High-efficiency realization of the super-robust rydberg deutsch gate, Phys. Rev. Res. 6, 023231 (2024)

  71. [72]

    Grech, M

    L. Grech, M. G. Krauss, M. Consiglio, T. J. Apollaro, C. P. Koch, S. Hirlaender, and G. Valentino, Achieving fast and robust perfect entangling gates via reinforcement learning, Quantum Sci. Technol. 11, 015030 (2026)

  72. [73]

    Porotti, D

    R. Porotti, D. Tamascelli, M. Restelli, and E. Prati, Coherent transport of quantum states by deep reinforcement learning, Commun. Phys. 2, 61 (2019)

  73. [74]

    Boscain, M

    U. Boscain, M. Sigalotti, and D. Sugny, Introduction to the pontryagin maximum principle for quantum optimal control, PRX Quantum 2, 030203 (2021)

  74. [75]

    L. T. Brady, C. L. Baldwin, A. Bapat, Y. Kharkov, and A. V. Gorshkov, Optimal protocols in quantum annealing and quantum approximate optimization algorithm problems, Phys. Rev. Lett. 126, 070505 (2021)

  75. [76]

    Y. Oda, D. Lucarelli, K. Schultz, B. D. Clader, and G. Quiroz, Optimally band-limited noise filtering for single-qubit gates, Phys. Rev. Appl. 19, 014062 (2023)

  76. [77]

    Patel, S

    N. Patel, S. Lee, S. Sarao Mannelli, S. Goldt, and A. Saxe, Rl perceptron: Generalization dynamics of policy learning in high dimensions, Phys. Rev. X 15, 021051 (2025)

  77. [78]

    Sarma and M

    B. Sarma and M. J. Hartmann, Designing fast quantum gates using optimal control with a reinforcement-learning ansatz, Phys. Rev. Appl. 23, 014015 (2025)

  78. [79]

    Schulman, S

    J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, in Proc. Mach. Learn. Res., Vol. 37 (PMLR, Lille, France, 2015) pp. 1889–1897

  79. [80]

    B. J. Pearson, J. L. White, T. C. Weinacht, and P. H. Bucksbaum, Coherent control using adaptive learning algorithms, Phys. Rev. A 63, 063412 (2001)

  80. [81]

    T.-N. Xu, Y. Ding, J. D. Martín-Guerrero, and X. Chen, Robust two-qubit gate with reinforcement learning and dropout, Phys. Rev. A 110, 032614 (2024)

Showing first 80 references.