pith. sign in

arxiv: 2606.24507 · v1 · pith:SYV7K7BTnew · submitted 2026-06-23 · 🪐 quant-ph

Uncovering Latent Structures in Robust Pulse Sequences: A Model-Based Reinforcement Learning Approach for Adaptable Quantum Control

Pith reviewed 2026-06-25 23:56 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum controlreinforcement learningpulse sequencesrobust gatesneural networksGRAPEtwo-level systemadaptive control
0
0 comments X

The pith

Embedding the Hamiltonian in a reinforcement learning network allows one model to generate robust pulses across a continuous range of quantum gate parameters at GRAPE-comparable fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a model-based reinforcement learning method where a neural network incorporates the quantum system's Hamiltonian directly during training. This single network can then produce high-fidelity robust pulse sequences for a continuous range of rotation angles, pulse durations, detunings, and field inhomogeneities. It achieves this without pre-computed data and at speeds much faster than traditional per-instance optimization like GRAPE. The approach also uncovers consistent latent structures in the pulse phase profiles that independent optimizations miss. If correct, this enables real-time adaptive quantum control across varying conditions.

Core claim

By embedding the Hamiltonian into the model-based reinforcement learning pipeline, a single neural network can be trained to generate robust optimal pulses for an entire family of gate configurations specified by rotation angle, duration, detuning, and inhomogeneity, achieving fidelities comparable to multi-seed GRAPE while producing more consistent structured phase profiles that enable smooth interpolation.

What carries the argument

A neural network trained end-to-end in a model-based RL framework with the quantum Hamiltonian as part of the model, taking control parameters as input to produce pulse sequences.

If this is right

  • Pulses for new parameter combinations are generated in milliseconds rather than through repeated per-instance optimization.
  • The network reveals the same structured phase profiles seen in GRAPE solutions but does so more consistently across runs.
  • Smooth interpolation becomes possible across the entire trained parameter space due to the network's continuity.
  • Any parameter in the Hamiltonian can be supplied as a network input, allowing the same framework to apply to different systems.
  • No separate pre-computed training data or reinitialization is needed when operating conditions change.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support closed-loop experimental control where parameters such as detuning drift over time.
  • The consistent phase structures might point to underlying symmetries in the control landscape that could be derived analytically.
  • Training the same approach on multi-qubit Hamiltonians could expose analogous latent patterns for larger systems.
  • The continuous mapping learned by the network suggests potential for transfer to related control problems on different hardware.

Load-bearing premise

Embedding the system's Hamiltonian into the reinforcement learning training produces a network that generalizes across a continuous family of gate configurations without requiring pre-computed training data or suffering from poor performance when parameters vary.

What would settle it

Testing the trained network on parameter values for rotation angle, detuning, and inhomogeneity within the trained ranges but not used in training, then comparing the resulting gate fidelities to those obtained from independent multi-seed GRAPE runs on the same instances.

Figures

Figures reproduced from arXiv: 2606.24507 by Florian Marquardt, L\'eo Van Damme, Sebastian Hohenemser, Steffen J. Glaser, Thomas Heydenreich, Tobias Kiermeyer.

Figure 1
Figure 1. Figure 1: FIG. 1. Schematic of model-based RL for quantum control. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Robustness profile of pulses with [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Comparison of neural network and GRAPE infidelities across 100,000 gate configurations [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Infidelity landscape of pulses generated by the neural [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Phase profiles of GRAPE-optimized pulses before [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Optimized phase profiles and robustness fingerprints as a function of pulse duration [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Infidelity as a function of pulse duration [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Neural-network-predicted pulses as a function of (a) rotation angle [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Real-time adaptive control of quantum systems requires rapid generation of robust, high-fidelity pulses across a continuous range of operating conditions. Standard optimization algorithms such as gradient-ascent pulse engineering (GRAPE) solve each instance independently, discarding information between runs and requiring costly reinitialization when parameters change. We present an approach to robust optimal quantum control based on model-based reinforcement learning, in which a single neural network -- embedding the Hamiltonian directly into the training pipeline -- generates robust gates across an entire family of gate configurations, without pre-computed training data. Demonstrated on a single-spin (two-level) system, the trained networks produce pulses for arbitrary rotation angles over a range of pulse durations, detunings, and field inhomogeneities in milliseconds, at fidelities comparable to multi-seed GRAPE. The framework is inherently adaptable: any parameter entering the Hamiltonian can serve as a network input, extending the approach to different systems and control settings. Beyond speed, the network reveals structure in the control landscape: it discovers the same structured phase profiles that appear in GRAPE solutions -- made identifiable through fidelity-invariant symmetry transformations -- but more consistently than independent optimization. This consistency enables smooth interpolation across the entire trained parameter space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a model-based reinforcement learning method for quantum control in which a neural network with the Hamiltonian embedded in the training pipeline generates robust pulses for a single-spin system. It claims that a single trained network produces high-fidelity controls for arbitrary rotation angles across ranges of pulse durations, detunings, and field inhomogeneities in milliseconds, matching multi-seed GRAPE performance while revealing consistent phase profiles via fidelity-invariant symmetry transformations and enabling smooth interpolation over the parameter space without pre-computed training data.

Significance. If the generalization and consistency claims hold, the work provides a practical route to real-time adaptive quantum control that reuses information across instances rather than restarting optimization for each parameter set. The model-based embedding and discovery of structured phase profiles are strengths that could inform control landscape analysis; the absence of pre-computed data distinguishes it from supervised alternatives.

major comments (2)
  1. [Abstract] Abstract: the claim that networks 'produce pulses ... at fidelities comparable to multi-seed GRAPE' supplies no quantitative error bars, exact comparison protocol, or details on how post-hoc symmetry transformations were applied, leaving the central performance assertion without the numerical grounding needed to evaluate it.
  2. [Abstract] Abstract and results on generalization: the headline assertion that a single network generalizes across arbitrary rotation angles, detunings, and inhomogeneities without fidelity collapse rests on untested out-of-distribution robustness; no sampling measure for the continuous parameter space, width of training intervals, or quantitative OOD fidelity curves are reported, which is load-bearing for the adaptability claim.
minor comments (1)
  1. [Abstract] Abstract: specify the numerical ranges and sampling distribution used for the continuous parameters during training.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our results. We respond to each major comment below and indicate revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that networks 'produce pulses ... at fidelities comparable to multi-seed GRAPE' supplies no quantitative error bars, exact comparison protocol, or details on how post-hoc symmetry transformations were applied, leaving the central performance assertion without the numerical grounding needed to evaluate it.

    Authors: We agree the abstract would benefit from greater quantitative specificity. In the revised manuscript we have added average fidelity values with standard deviations (obtained over 10 independent training runs), clarified the multi-seed GRAPE protocol (five random initial seeds per instance, best result retained), and inserted a concise reference to the fidelity-invariant symmetry transformations whose application is fully detailed in Section 3.2. These additions supply the requested numerical grounding while preserving abstract length. revision: yes

  2. Referee: [Abstract] Abstract and results on generalization: the headline assertion that a single network generalizes across arbitrary rotation angles, detunings, and inhomogeneities without fidelity collapse rests on untested out-of-distribution robustness; no sampling measure for the continuous parameter space, width of training intervals, or quantitative OOD fidelity curves are reported, which is load-bearing for the adaptability claim.

    Authors: Section 4.1 specifies uniform sampling over the continuous parameter intervals used for training, and Figure 4 already plots fidelity versus each parameter (including points at and beyond the training boundaries). To meet the referee's request we have now stated the exact interval widths explicitly in the text and added a supplementary figure with quantitative OOD fidelity curves. The network architecture accepts the Hamiltonian parameters as direct inputs, which underpins the observed generalization within the reported ranges; the added material makes this evidence more transparent. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical RL training on Hamiltonian dynamics is self-contained

full rationale

The paper describes a standard model-based RL pipeline that embeds the system Hamiltonian into the training loop to produce a policy network for pulse generation. No equations or claims reduce a derived quantity to a fitted parameter by construction, invoke self-citations for uniqueness theorems, or rename known results. The reported fidelities and interpolation properties are outputs of the trained network evaluated on simulated dynamics, not tautological re-statements of inputs. This is the normal non-circular outcome for a computational ML control paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard reinforcement-learning assumptions plus the domain assumption that the embedded Hamiltonian accurately represents the physical system during training. Numerous free parameters exist in network architecture and training procedure; none are enumerated in the abstract.

free parameters (1)
  • Neural-network architecture and training hyperparameters
    Network depth, width, learning rate, reward shaping, and episode length are chosen to achieve the reported performance.
axioms (1)
  • domain assumption The quantum dynamics during training are faithfully captured by the embedded Hamiltonian.
    Model-based RL requires an accurate forward model; any mismatch would invalidate the learned policy.

pith-pipeline@v0.9.1-grok · 5771 in / 1399 out tokens · 38656 ms · 2026-06-25T23:56:38.430116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 1 canonical work pages

  1. [1]

    (4) require the evaluation of the derivative ∂Pn ∂ukn

    Exact Analytical Gradients The gradients given in Eq. (4) require the evaluation of the derivative ∂Pn ∂ukn . Standard GRAPE implementations approximate this derivative to first order in ∆tas: ∂ ˆPn ∂ukn =−i∆t σk 2 ˆPn +O(∆t 2).(A5) This first-order approximation requires small ∆tto re- main accurate. We instead use the exact analytical derivative for the...

  2. [2]

    C. P. Koch, U. Boscain, T. Calarco, G. Dirr, S. Fil- ipp, S. J. Glaser, R. Kosloff, S. Montangero, T. Schulte- Herbr¨ uggen, D. Sugny, and F. K. Wilhelm, Quantum op- timal control in quantum technologies: Strategic report on current status, visions and goals for research in eu- rope, EPJ Quantum Technology9, 19 (2022)

  3. [3]

    Ansel, E

    Q. Ansel, E. Dionis, F. Arrouas, B. Peaudecerf, S. Gu´ erin, D. Gu´ ery-Odelin, and D. Sugny, Introduction to theoretical and experimental aspects of quantum op- timal control, Journal of Physics B: Atomic, Molecular and Optical Physics57, 133001 (2024)

  4. [4]

    C. P. Koch, Controlling open quantum systems: tools, achievements, and limitations, Journal of Physics: Con- densed Matter28, 213001 (2016)

  5. [5]

    L. M. K. Vandersypen and I. L. Chuang, Nmr techniques for quantum control and computation, Rev. Mod. Phys. 76, 1037 (2005)

  6. [6]

    Saffman, T

    M. Saffman, T. G. Walker, and K. Mølmer, Quantum information with rydberg atoms, Rev. Mod. Phys.82, 2313 (2010)

  7. [7]

    Hanson, L

    R. Hanson, L. P. Kouwenhoven, J. R. Petta, S. Tarucha, and L. M. K. Vandersypen, Spins in few-electron quan- tum dots, Rev. Mod. Phys.79, 1217 (2007)

  8. [8]

    Preskill, Quantum Computing in the NISQ era and beyond, Quantum2, 79 (2018)

    J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum2, 79 (2018)

  9. [9]

    S. J. Glaser, U. Boscain, T. Calarco, C. P. Koch, W. K¨ ockenberger, R. Kosloff, I. Kuprov, B. Luy, S. Schirmer, T. Schulte-Herbr¨ uggen, D. Sugny, and F. K. Wilhelm, Training schr¨ odinger’s cat: quantum opti- mal control, The European Physical Journal D69, 279 (2015)

  10. [10]

    Van Damme, F

    L. Van Damme, F. Mauconduit, T. Chambrion, N. Boulant, and V. Gras, Universal nonselective exci- tation and refocusing pulses with improved robustness to off-resonance for Magnetic Resonance Imaging at 7 Tesla with parallel transmission, Magnetic Resonance in Medicine85, 678 (2021)

  11. [11]

    Kobzar, S

    K. Kobzar, S. Ehni, T. E. Skinner, S. J. Glaser, and B. Luy, Exploring the limits of broadband 90°and 180° universal rotation pulses, Journal of Magnetic Resonance 225, 142 (2012)

  12. [12]

    Khaneja, T

    N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbr¨ uggen, and S. J. Glaser, Optimal control of coupled spin dy- namics: design of nmr pulse sequences by gradient as- cent algorithms, Journal of Magnetic Resonance172, 296 (2005)

  13. [13]

    J. D. Chadwick and F. T. Chong, Efficient con- trol pulses for continuous quantum gate fam- ilies through coordinated re-optimization, in Proc. IEEE Int. Conf. Quantum Comput. Eng. (QCE), Vol. 1 (IEEE, 2023) pp. 1286–1294

  14. [14]

    Lacroix, C

    N. Lacroix, C. Hellings, C. K. Andersen, A. Di Paolo, A. Remm, S. Lazar, S. Krinner, G. J. Norris, M. Gabu- reac, J. Heinsoo, A. Blais, C. Eichler, and A. Wallraff, Improving the performance of deep quantum optimiza- tion algorithms with continuous gate sets, PRX Quantum 1, 020304 (2020)

  15. [15]

    O. R. Meitei, B. T. Gard, G. S. Barron, D. P. Pappas, S. E. Economou, E. Barnes, and N. J. Mayhall, Gate-free state preparation for fast variational quantum eigensolver simulations, npj Quantum Information7, 155 (2021)

  16. [16]

    Sp¨ orl, T

    A. Sp¨ orl, T. Schulte-Herbr¨ uggen, S. J. Glaser, V. Bergholm, M. J. Storcz, J. Ferber, and F. K. Wilhelm, Optimal control of coupled josephson qubits, Phys. Rev. A75, 012302 (2007)

  17. [17]

    R. W. Heeres, P. Reinhold, N. Ofek, L. Frunzio, L. Jiang, M. H. Devoret, and R. J. Schoelkopf, Implementing a uni- versal gate set on a logical qubit encoded in an oscillator, Nature communications8, 94 (2017)

  18. [18]

    Jandura and G

    S. Jandura and G. Pupillo, Time-optimal two- and three- qubit gates for rydberg atoms, Quantum6, 712 (2022)

  19. [19]

    S. J. Evered, D. Bluvstein, M. Kalinowski, S. Ebadi, T. Manovitz, H. Zhou, S. H. Li, A. A. Geim, T. T. Wang, N. Maskara, H. Levine, G. Semeghini, M. Greiner, V. Vuleti´ c, and M. D. Lukin, High-fidelity parallel entan- gling gates on a neutral-atom quantum computer, Nature 622, 268 (2023)

  20. [20]

    Joseph and C

    D. Joseph and C. Griesinger, Optimal control pulses for the 1.2-ghz (28.2-t) nmr spectrometers, Science Advances 9, eadj1133 (2023)

  21. [21]

    Dolde, V

    F. Dolde, V. Bergholm, Y. Wang, I. Jakobi, B. Nayde- nov, S. Pezzagna, J. Meijer, F. Jelezko, P. Neumann, T. Schulte-Herbr¨ uggen,et al., High-fidelity spin entan- glement using optimal control, Nature communications 5, 3371 (2014)

  22. [22]

    M. S. Vinding, C. S. Aigner, S. Schmitter, and T. E. Lund, Deepcontrol: 2drf pulses facilitating inhomogene- ity and b0 off-resonance compensation in vivo at 7 t, Magnetic Resonance in Medicine85, 3308

  23. [23]

    M. V. Subrahmanian, K. Pavuluri, C. Olivieri, and G. Veglia, High-fidelity control of spin ensemble dynam- ics via artificial intelligence: from quantum computing to nmr spectroscopy and imaging, PNAS Nexus1, pgac133 (2022), https://academic.oup.com/pnasnexus/article- pdf/1/4/pgac133/48849409/pgac133.pdf

  24. [24]

    F¨ osel, P

    T. F¨ osel, P. Tighineanu, T. Weiss, and F. Marquardt, Re- inforcement learning with neural networks for quantum feedback, Phys. Rev. X8, 031084 (2018)

  25. [25]

    Bukov, A

    M. Bukov, A. G. R. Day, D. Sels, P. Weinberg, A. Polkovnikov, and P. Mehta, Reinforcement learning in different phases of quantum control, Phys. Rev. X8, 031086 (2018)

  26. [26]

    Nature Communications14, 7138 (2023) https://doi.org/10.1038/s41467-023-42901-3 arXiv:2210.16715 42

    K. Reuer, J. Landgraf, T. F¨ osel, J. O’Sullivan, L. Beltr´ an, A. Akin, G. J. Norris, A. Remm, M. Kerschbaum, J.- C. Besse, F. Marquardt, A. Wallraff, and C. Eichler, Realizing a deep reinforcement learning agent for real- 14 time quantum feedback, Nature Communications14, 10.1038/s41467-023-42901-3 (2023)

  27. [27]

    Bukov and F

    M. Bukov and F. Marquardt, Reinforcement learning for quantum technology (2026), arXiv:2601.18953 [quant- ph]

  28. [28]

    Y. Baum, M. Amico, S. Howell, M. Hush, M. Liuzzi, P. Mundada, T. Merkh, A. R. Carvalho, and M. J. Bier- cuk, Experimental deep reinforcement learning for error- robust gate-set design on a superconducting quantum computer, PRX Quantum2, 040324 (2021)

  29. [29]

    V. V. Sivak, A. Eickbusch, H. Liu, B. Royer, I. Tsioutsios, and M. H. Devoret, Model-free quantum control with re- inforcement learning, Phys. Rev. X12, 011059 (2022)

  30. [30]

    S. Li, Y. Fan, X. Li, X. Ruan, Q. Zhao, Z. Peng, R.- B. Wu, J. Zhang, and P. Song, Robust quantum con- trol using reinforcement learning from demonstration, npj Quantum Information11, 124 (2025)

  31. [31]

    Sch¨ afer, M

    F. Sch¨ afer, M. Kloc, C. Bruder, and N. L¨ orch, A dif- ferentiable programming method for quantum control, Machine Learning: Science and Technology1, 035009 (2020)

  32. [32]

    Hutin, P

    H. Hutin, P. Bilous, C. Ye, S. Abdollahi, L. Cros, T. Dvir, T. Shah, Y. Cohen, A. Bienfait, F. Marquardt, and B. Huard, Preparing schr¨ odinger cat states in a mi- crowave cavity using a neural network, PRX Quantum 6, 010321 (2025)

  33. [33]

    Porotti, V

    R. Porotti, V. Peano, and F. Marquardt, Gradient- ascent pulse engineering with feedback, PRX Quantum 4, 030305 (2023)

  34. [34]

    Leung, M

    N. Leung, M. Abdelhafez, J. Koch, and D. Schuster, Speedup for quantum optimal control from automatic differentiation based on graphics processing units, Phys. Rev. A95, 042318 (2017)

  35. [35]

    M. A. Janich, R. F. Schulte, M. Schwaiger, and S. J. Glaser, Robust slice-selective broadband refocusing pulses, Journal of Magnetic Resonance213, 126 (2011)

  36. [36]

    X. Xie, P. Zhou, H. Li, Z. Lin, and S. Yan, Adan: Adap- tive nesterov momentum algorithm for faster optimizing deep models (2024), arXiv:2208.06677 [cs.LG]

  37. [37]

    D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2017), arXiv:1412.6980 [cs.LG]

  38. [38]

    Bradbury, R

    J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Van- derPlas, S. Wanderman-Milne, and Q. Zhang, JAX: com- posable transformations of Python+NumPy programs (2018)

  39. [39]

    Kuprov, Spin system trajectory analysis under optimal control pulses, Journal of Magnetic Resonance233, 107 (2013)

    I. Kuprov, Spin system trajectory analysis under optimal control pulses, Journal of Magnetic Resonance233, 107 (2013)

  40. [40]

    Braun and S

    M. Braun and S. J. Glaser, Concurrently optimized co- operative pulses in robust quantum control: application to broadband ramsey-type pulse sequence elements, New Journal of Physics16, 115002 (2014)