Uncovering Latent Structures in Robust Pulse Sequences: A Model-Based Reinforcement Learning Approach for Adaptable Quantum Control
Pith reviewed 2026-06-25 23:56 UTC · model grok-4.3
The pith
Embedding the Hamiltonian in a reinforcement learning network allows one model to generate robust pulses across a continuous range of quantum gate parameters at GRAPE-comparable fidelity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding the Hamiltonian into the model-based reinforcement learning pipeline, a single neural network can be trained to generate robust optimal pulses for an entire family of gate configurations specified by rotation angle, duration, detuning, and inhomogeneity, achieving fidelities comparable to multi-seed GRAPE while producing more consistent structured phase profiles that enable smooth interpolation.
What carries the argument
A neural network trained end-to-end in a model-based RL framework with the quantum Hamiltonian as part of the model, taking control parameters as input to produce pulse sequences.
If this is right
- Pulses for new parameter combinations are generated in milliseconds rather than through repeated per-instance optimization.
- The network reveals the same structured phase profiles seen in GRAPE solutions but does so more consistently across runs.
- Smooth interpolation becomes possible across the entire trained parameter space due to the network's continuity.
- Any parameter in the Hamiltonian can be supplied as a network input, allowing the same framework to apply to different systems.
- No separate pre-computed training data or reinitialization is needed when operating conditions change.
Where Pith is reading between the lines
- The method could support closed-loop experimental control where parameters such as detuning drift over time.
- The consistent phase structures might point to underlying symmetries in the control landscape that could be derived analytically.
- Training the same approach on multi-qubit Hamiltonians could expose analogous latent patterns for larger systems.
- The continuous mapping learned by the network suggests potential for transfer to related control problems on different hardware.
Load-bearing premise
Embedding the system's Hamiltonian into the reinforcement learning training produces a network that generalizes across a continuous family of gate configurations without requiring pre-computed training data or suffering from poor performance when parameters vary.
What would settle it
Testing the trained network on parameter values for rotation angle, detuning, and inhomogeneity within the trained ranges but not used in training, then comparing the resulting gate fidelities to those obtained from independent multi-seed GRAPE runs on the same instances.
Figures
read the original abstract
Real-time adaptive control of quantum systems requires rapid generation of robust, high-fidelity pulses across a continuous range of operating conditions. Standard optimization algorithms such as gradient-ascent pulse engineering (GRAPE) solve each instance independently, discarding information between runs and requiring costly reinitialization when parameters change. We present an approach to robust optimal quantum control based on model-based reinforcement learning, in which a single neural network -- embedding the Hamiltonian directly into the training pipeline -- generates robust gates across an entire family of gate configurations, without pre-computed training data. Demonstrated on a single-spin (two-level) system, the trained networks produce pulses for arbitrary rotation angles over a range of pulse durations, detunings, and field inhomogeneities in milliseconds, at fidelities comparable to multi-seed GRAPE. The framework is inherently adaptable: any parameter entering the Hamiltonian can serve as a network input, extending the approach to different systems and control settings. Beyond speed, the network reveals structure in the control landscape: it discovers the same structured phase profiles that appear in GRAPE solutions -- made identifiable through fidelity-invariant symmetry transformations -- but more consistently than independent optimization. This consistency enables smooth interpolation across the entire trained parameter space.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a model-based reinforcement learning method for quantum control in which a neural network with the Hamiltonian embedded in the training pipeline generates robust pulses for a single-spin system. It claims that a single trained network produces high-fidelity controls for arbitrary rotation angles across ranges of pulse durations, detunings, and field inhomogeneities in milliseconds, matching multi-seed GRAPE performance while revealing consistent phase profiles via fidelity-invariant symmetry transformations and enabling smooth interpolation over the parameter space without pre-computed training data.
Significance. If the generalization and consistency claims hold, the work provides a practical route to real-time adaptive quantum control that reuses information across instances rather than restarting optimization for each parameter set. The model-based embedding and discovery of structured phase profiles are strengths that could inform control landscape analysis; the absence of pre-computed data distinguishes it from supervised alternatives.
major comments (2)
- [Abstract] Abstract: the claim that networks 'produce pulses ... at fidelities comparable to multi-seed GRAPE' supplies no quantitative error bars, exact comparison protocol, or details on how post-hoc symmetry transformations were applied, leaving the central performance assertion without the numerical grounding needed to evaluate it.
- [Abstract] Abstract and results on generalization: the headline assertion that a single network generalizes across arbitrary rotation angles, detunings, and inhomogeneities without fidelity collapse rests on untested out-of-distribution robustness; no sampling measure for the continuous parameter space, width of training intervals, or quantitative OOD fidelity curves are reported, which is load-bearing for the adaptability claim.
minor comments (1)
- [Abstract] Abstract: specify the numerical ranges and sampling distribution used for the continuous parameters during training.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our results. We respond to each major comment below and indicate revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that networks 'produce pulses ... at fidelities comparable to multi-seed GRAPE' supplies no quantitative error bars, exact comparison protocol, or details on how post-hoc symmetry transformations were applied, leaving the central performance assertion without the numerical grounding needed to evaluate it.
Authors: We agree the abstract would benefit from greater quantitative specificity. In the revised manuscript we have added average fidelity values with standard deviations (obtained over 10 independent training runs), clarified the multi-seed GRAPE protocol (five random initial seeds per instance, best result retained), and inserted a concise reference to the fidelity-invariant symmetry transformations whose application is fully detailed in Section 3.2. These additions supply the requested numerical grounding while preserving abstract length. revision: yes
-
Referee: [Abstract] Abstract and results on generalization: the headline assertion that a single network generalizes across arbitrary rotation angles, detunings, and inhomogeneities without fidelity collapse rests on untested out-of-distribution robustness; no sampling measure for the continuous parameter space, width of training intervals, or quantitative OOD fidelity curves are reported, which is load-bearing for the adaptability claim.
Authors: Section 4.1 specifies uniform sampling over the continuous parameter intervals used for training, and Figure 4 already plots fidelity versus each parameter (including points at and beyond the training boundaries). To meet the referee's request we have now stated the exact interval widths explicitly in the text and added a supplementary figure with quantitative OOD fidelity curves. The network architecture accepts the Hamiltonian parameters as direct inputs, which underpins the observed generalization within the reported ranges; the added material makes this evidence more transparent. revision: partial
Circularity Check
No circularity: empirical RL training on Hamiltonian dynamics is self-contained
full rationale
The paper describes a standard model-based RL pipeline that embeds the system Hamiltonian into the training loop to produce a policy network for pulse generation. No equations or claims reduce a derived quantity to a fitted parameter by construction, invoke self-citations for uniqueness theorems, or rename known results. The reported fidelities and interpolation properties are outputs of the trained network evaluated on simulated dynamics, not tautological re-statements of inputs. This is the normal non-circular outcome for a computational ML control paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural-network architecture and training hyperparameters
axioms (1)
- domain assumption The quantum dynamics during training are faithfully captured by the embedded Hamiltonian.
Reference graph
Works this paper leans on
-
[1]
(4) require the evaluation of the derivative ∂Pn ∂ukn
Exact Analytical Gradients The gradients given in Eq. (4) require the evaluation of the derivative ∂Pn ∂ukn . Standard GRAPE implementations approximate this derivative to first order in ∆tas: ∂ ˆPn ∂ukn =−i∆t σk 2 ˆPn +O(∆t 2).(A5) This first-order approximation requires small ∆tto re- main accurate. We instead use the exact analytical derivative for the...
2048
-
[2]
C. P. Koch, U. Boscain, T. Calarco, G. Dirr, S. Fil- ipp, S. J. Glaser, R. Kosloff, S. Montangero, T. Schulte- Herbr¨ uggen, D. Sugny, and F. K. Wilhelm, Quantum op- timal control in quantum technologies: Strategic report on current status, visions and goals for research in eu- rope, EPJ Quantum Technology9, 19 (2022)
2022
-
[3]
Ansel, E
Q. Ansel, E. Dionis, F. Arrouas, B. Peaudecerf, S. Gu´ erin, D. Gu´ ery-Odelin, and D. Sugny, Introduction to theoretical and experimental aspects of quantum op- timal control, Journal of Physics B: Atomic, Molecular and Optical Physics57, 133001 (2024)
2024
-
[4]
C. P. Koch, Controlling open quantum systems: tools, achievements, and limitations, Journal of Physics: Con- densed Matter28, 213001 (2016)
2016
-
[5]
L. M. K. Vandersypen and I. L. Chuang, Nmr techniques for quantum control and computation, Rev. Mod. Phys. 76, 1037 (2005)
2005
-
[6]
Saffman, T
M. Saffman, T. G. Walker, and K. Mølmer, Quantum information with rydberg atoms, Rev. Mod. Phys.82, 2313 (2010)
2010
-
[7]
Hanson, L
R. Hanson, L. P. Kouwenhoven, J. R. Petta, S. Tarucha, and L. M. K. Vandersypen, Spins in few-electron quan- tum dots, Rev. Mod. Phys.79, 1217 (2007)
2007
-
[8]
Preskill, Quantum Computing in the NISQ era and beyond, Quantum2, 79 (2018)
J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum2, 79 (2018)
2018
-
[9]
S. J. Glaser, U. Boscain, T. Calarco, C. P. Koch, W. K¨ ockenberger, R. Kosloff, I. Kuprov, B. Luy, S. Schirmer, T. Schulte-Herbr¨ uggen, D. Sugny, and F. K. Wilhelm, Training schr¨ odinger’s cat: quantum opti- mal control, The European Physical Journal D69, 279 (2015)
2015
-
[10]
Van Damme, F
L. Van Damme, F. Mauconduit, T. Chambrion, N. Boulant, and V. Gras, Universal nonselective exci- tation and refocusing pulses with improved robustness to off-resonance for Magnetic Resonance Imaging at 7 Tesla with parallel transmission, Magnetic Resonance in Medicine85, 678 (2021)
2021
-
[11]
Kobzar, S
K. Kobzar, S. Ehni, T. E. Skinner, S. J. Glaser, and B. Luy, Exploring the limits of broadband 90°and 180° universal rotation pulses, Journal of Magnetic Resonance 225, 142 (2012)
2012
-
[12]
Khaneja, T
N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbr¨ uggen, and S. J. Glaser, Optimal control of coupled spin dy- namics: design of nmr pulse sequences by gradient as- cent algorithms, Journal of Magnetic Resonance172, 296 (2005)
2005
-
[13]
J. D. Chadwick and F. T. Chong, Efficient con- trol pulses for continuous quantum gate fam- ilies through coordinated re-optimization, in Proc. IEEE Int. Conf. Quantum Comput. Eng. (QCE), Vol. 1 (IEEE, 2023) pp. 1286–1294
2023
-
[14]
Lacroix, C
N. Lacroix, C. Hellings, C. K. Andersen, A. Di Paolo, A. Remm, S. Lazar, S. Krinner, G. J. Norris, M. Gabu- reac, J. Heinsoo, A. Blais, C. Eichler, and A. Wallraff, Improving the performance of deep quantum optimiza- tion algorithms with continuous gate sets, PRX Quantum 1, 020304 (2020)
2020
-
[15]
O. R. Meitei, B. T. Gard, G. S. Barron, D. P. Pappas, S. E. Economou, E. Barnes, and N. J. Mayhall, Gate-free state preparation for fast variational quantum eigensolver simulations, npj Quantum Information7, 155 (2021)
2021
-
[16]
Sp¨ orl, T
A. Sp¨ orl, T. Schulte-Herbr¨ uggen, S. J. Glaser, V. Bergholm, M. J. Storcz, J. Ferber, and F. K. Wilhelm, Optimal control of coupled josephson qubits, Phys. Rev. A75, 012302 (2007)
2007
-
[17]
R. W. Heeres, P. Reinhold, N. Ofek, L. Frunzio, L. Jiang, M. H. Devoret, and R. J. Schoelkopf, Implementing a uni- versal gate set on a logical qubit encoded in an oscillator, Nature communications8, 94 (2017)
2017
-
[18]
Jandura and G
S. Jandura and G. Pupillo, Time-optimal two- and three- qubit gates for rydberg atoms, Quantum6, 712 (2022)
2022
-
[19]
S. J. Evered, D. Bluvstein, M. Kalinowski, S. Ebadi, T. Manovitz, H. Zhou, S. H. Li, A. A. Geim, T. T. Wang, N. Maskara, H. Levine, G. Semeghini, M. Greiner, V. Vuleti´ c, and M. D. Lukin, High-fidelity parallel entan- gling gates on a neutral-atom quantum computer, Nature 622, 268 (2023)
2023
-
[20]
Joseph and C
D. Joseph and C. Griesinger, Optimal control pulses for the 1.2-ghz (28.2-t) nmr spectrometers, Science Advances 9, eadj1133 (2023)
2023
-
[21]
Dolde, V
F. Dolde, V. Bergholm, Y. Wang, I. Jakobi, B. Nayde- nov, S. Pezzagna, J. Meijer, F. Jelezko, P. Neumann, T. Schulte-Herbr¨ uggen,et al., High-fidelity spin entan- glement using optimal control, Nature communications 5, 3371 (2014)
2014
-
[22]
M. S. Vinding, C. S. Aigner, S. Schmitter, and T. E. Lund, Deepcontrol: 2drf pulses facilitating inhomogene- ity and b0 off-resonance compensation in vivo at 7 t, Magnetic Resonance in Medicine85, 3308
-
[23]
M. V. Subrahmanian, K. Pavuluri, C. Olivieri, and G. Veglia, High-fidelity control of spin ensemble dynam- ics via artificial intelligence: from quantum computing to nmr spectroscopy and imaging, PNAS Nexus1, pgac133 (2022), https://academic.oup.com/pnasnexus/article- pdf/1/4/pgac133/48849409/pgac133.pdf
2022
-
[24]
F¨ osel, P
T. F¨ osel, P. Tighineanu, T. Weiss, and F. Marquardt, Re- inforcement learning with neural networks for quantum feedback, Phys. Rev. X8, 031084 (2018)
2018
-
[25]
Bukov, A
M. Bukov, A. G. R. Day, D. Sels, P. Weinberg, A. Polkovnikov, and P. Mehta, Reinforcement learning in different phases of quantum control, Phys. Rev. X8, 031086 (2018)
2018
-
[26]
Nature Communications14, 7138 (2023) https://doi.org/10.1038/s41467-023-42901-3 arXiv:2210.16715 42
K. Reuer, J. Landgraf, T. F¨ osel, J. O’Sullivan, L. Beltr´ an, A. Akin, G. J. Norris, A. Remm, M. Kerschbaum, J.- C. Besse, F. Marquardt, A. Wallraff, and C. Eichler, Realizing a deep reinforcement learning agent for real- 14 time quantum feedback, Nature Communications14, 10.1038/s41467-023-42901-3 (2023)
-
[27]
M. Bukov and F. Marquardt, Reinforcement learning for quantum technology (2026), arXiv:2601.18953 [quant- ph]
arXiv 2026
-
[28]
Y. Baum, M. Amico, S. Howell, M. Hush, M. Liuzzi, P. Mundada, T. Merkh, A. R. Carvalho, and M. J. Bier- cuk, Experimental deep reinforcement learning for error- robust gate-set design on a superconducting quantum computer, PRX Quantum2, 040324 (2021)
2021
-
[29]
V. V. Sivak, A. Eickbusch, H. Liu, B. Royer, I. Tsioutsios, and M. H. Devoret, Model-free quantum control with re- inforcement learning, Phys. Rev. X12, 011059 (2022)
2022
-
[30]
S. Li, Y. Fan, X. Li, X. Ruan, Q. Zhao, Z. Peng, R.- B. Wu, J. Zhang, and P. Song, Robust quantum con- trol using reinforcement learning from demonstration, npj Quantum Information11, 124 (2025)
2025
-
[31]
Sch¨ afer, M
F. Sch¨ afer, M. Kloc, C. Bruder, and N. L¨ orch, A dif- ferentiable programming method for quantum control, Machine Learning: Science and Technology1, 035009 (2020)
2020
-
[32]
Hutin, P
H. Hutin, P. Bilous, C. Ye, S. Abdollahi, L. Cros, T. Dvir, T. Shah, Y. Cohen, A. Bienfait, F. Marquardt, and B. Huard, Preparing schr¨ odinger cat states in a mi- crowave cavity using a neural network, PRX Quantum 6, 010321 (2025)
2025
-
[33]
Porotti, V
R. Porotti, V. Peano, and F. Marquardt, Gradient- ascent pulse engineering with feedback, PRX Quantum 4, 030305 (2023)
2023
-
[34]
Leung, M
N. Leung, M. Abdelhafez, J. Koch, and D. Schuster, Speedup for quantum optimal control from automatic differentiation based on graphics processing units, Phys. Rev. A95, 042318 (2017)
2017
-
[35]
M. A. Janich, R. F. Schulte, M. Schwaiger, and S. J. Glaser, Robust slice-selective broadband refocusing pulses, Journal of Magnetic Resonance213, 126 (2011)
2011
-
[36]
X. Xie, P. Zhou, H. Li, Z. Lin, and S. Yan, Adan: Adap- tive nesterov momentum algorithm for faster optimizing deep models (2024), arXiv:2208.06677 [cs.LG]
arXiv 2024
-
[37]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2017), arXiv:1412.6980 [cs.LG]
Pith/arXiv arXiv 2017
-
[38]
Bradbury, R
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Van- derPlas, S. Wanderman-Milne, and Q. Zhang, JAX: com- posable transformations of Python+NumPy programs (2018)
2018
-
[39]
Kuprov, Spin system trajectory analysis under optimal control pulses, Journal of Magnetic Resonance233, 107 (2013)
I. Kuprov, Spin system trajectory analysis under optimal control pulses, Journal of Magnetic Resonance233, 107 (2013)
2013
-
[40]
Braun and S
M. Braun and S. J. Glaser, Concurrently optimized co- operative pulses in robust quantum control: application to broadband ramsey-type pulse sequence elements, New Journal of Physics16, 115002 (2014)
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.