Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization

Haftu W. Fentaw; Simon Caton; Steve Campbell

arxiv: 2605.26925 · v1 · pith:73JGNLFMnew · submitted 2026-05-26 · 🪐 quant-ph · cs.LG

Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization

Haftu W. Fentaw , Steve Campbell , Simon Caton This is my paper

Pith reviewed 2026-06-29 16:58 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords reinforcement learningquantum controlopen quantum systemsmulti-task learningpulse optimizationnoise robustnessstate transfer

0 comments

The pith

A multi-task reinforcement learning model generates control pulses that achieve high-fidelity state transfer under noise across many different Hamiltonians.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a multi-task Soft Actor-Critic framework that learns optimal pulse sequences for open quantum systems while also determining suitable evolution times and numbers of pulse segments. Experiments across 51 Hamiltonian variations show that the trained model drives systems from initial to target states with high fidelities despite environmental noise. The same model generalizes to Hamiltonians not seen during training when they come from the same space. Robustness Infidelity Measure analysis indicates these policies withstand pulse amplitude perturbations and decoherence rate changes better than GRAPE-optimized controls.

Core claim

A single multi-task SAC model trained on a collection of Hamiltonians can produce pulse sequences that transfer quantum states under environment noise with high fidelities, simultaneously identifying problem-specific evolution time T and segment count N, and these policies succeed on unseen Hamiltonians drawn from the same space while demonstrating greater robustness to amplitude and decoherence variations than GRAPE controls.

What carries the argument

The multi-task Soft Actor-Critic reinforcement learning agent that jointly optimizes control pulses, evolution time T, and number of segments N across multiple Hamiltonian tasks.

If this is right

One model trained on multiple Hamiltonians succeeds on state-transfer tasks for Hamiltonians not encountered in training.
The framework automatically selects suitable evolution times and pulse segment counts for each individual problem.
SAC-derived policies exhibit superior robustness to pulse amplitude perturbations and decoherence rate variations compared with GRAPE controls.
The results establish a route toward control methods that apply across realistic noisy quantum devices without per-instance redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If generalization to unseen Hamiltonians holds, control strategies could adapt to new device parameters without full retraining.
The joint optimization of time and segments might extend to other quantum tasks such as gate synthesis.
Adding more varied noise models during training could increase the chance that simulation results survive on physical hardware.

Load-bearing premise

Training on a finite set of 51 Hamiltonian variations is sufficient to enable successful state transfer for unseen Hamiltonians from the same space, and that simulation robustness translates to real devices.

What would settle it

Testing the trained model on a real quantum device using a Hamiltonian variation outside the training set and checking whether the measured fidelity remains high or drops sharply.

read the original abstract

We present a Multi-task Soft Actor-Critic (SAC) Reinforcement Learning framework designed for open-system quantum control across diverse Hamiltonians, which learns optimal pulse sequences while simultaneously discovering problem-specific evolution time T and number of control pulse segments N. Experimental results across 51 Hamiltonian variations demonstrate that the multi-task SAC model is able to generate control pulses that can drive a system, under environment noise, from its initial state to its target state with high fidelities, establishing essential foundations for universal quantum control applicable to realistic noisy quantum devices. Through progressive expansion of the training Hamiltonian set, we investigate if a single multi-task model trained using a given number of sample Hamiltonians can successfully accomplish state-transfer tasks for Hamiltonians drawn from the same Hamiltonian space but not encountered during training. In addition, our Robustness Infidelity Measure (RIM) analysis reveals that SAC trained policies exhibit superior robustness to pulse amplitude perturbations and decoherence rate variations compared to GRAPE-optimized controls.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward empirical application of multi-task SAC to open quantum control that adds joint optimization of T and N plus a robustness metric, but the generalization results from 51 training cases to unseen Hamiltonians are not quantified enough to support the universality claim.

read the letter

The paper trains a single multi-task SAC agent on open quantum systems across 51 Hamiltonian variations, letting the policy choose both the control pulses and the evolution time T plus segment count N at the same time. It then checks whether that policy still works on Hamiltonians from the same family that were left out of training, and it introduces a Robustness Infidelity Measure to compare against GRAPE under amplitude and decoherence perturbations.

What stands out is the joint optimization of T and N inside the RL loop and the explicit test of progressive training-set expansion. The RIM comparison is also a clean, reproducible way to quantify robustness in simulation. Those pieces are concrete and could be useful to groups already running RL on quantum devices.

The soft spot is the generalization step. The abstract says they investigate whether the model succeeds on unseen Hamiltonians, yet no test-set fidelities, success rates, or sampling details for the 51 variations appear in the provided summary. Without those numbers it is difficult to judge whether the finite training set actually buys the claimed robustness to new Hamiltonians. The internal RIM comparison does not address that extrapolation.

This is the kind of paper that belongs in a specialized quantum-control or RL-for-physics venue. Readers already working on pulse design for noisy qubits will find the experimental setup worth looking at, even if they end up wanting tighter numbers on out-of-distribution performance. It is solid enough to send out for peer review rather than desk-reject; the experiments are there and the questions are well-posed, even if the current write-up leaves some of the key metrics implicit.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a multi-task Soft Actor-Critic (SAC) reinforcement learning framework for open quantum system control that simultaneously optimizes control pulse sequences, evolution time T, and number of segments N across diverse Hamiltonians. Experiments on 51 Hamiltonian variations under environment noise report high-fidelity state transfers and introduce a Robustness Infidelity Measure (RIM) showing superior performance to GRAPE; the work also examines whether progressive expansion of the training set enables generalization to unseen Hamiltonians from the same space.

Significance. If the generalization results hold with supporting metrics, the multi-task SAC approach could provide a practical route to adaptive, robust control for noisy quantum devices, moving beyond single-Hamiltonian optimization methods. The joint learning of T and N alongside pulses, together with the RIM comparison, represents a concrete empirical contribution to RL-based quantum control.

major comments (3)

[Results / generalization experiments] The section describing progressive expansion of the training Hamiltonian set states that the model is tested on Hamiltonians not encountered during training, yet reports no quantitative metrics such as mean test fidelity, standard deviation, success rate, or failure rate on the held-out set. This information is load-bearing for the central claim of generalization to unseen Hamiltonians drawn from the same space.
[Abstract and Experimental Results] The abstract and results claim 'high fidelities' across the 51 variations, but the manuscript provides neither the numerical fidelity values (with error bars or ranges) nor the precise definition of the fidelity measure used under noise. Without these, the support for the robustness and performance claims cannot be evaluated.
[RIM analysis] The RIM analysis asserts superior robustness of SAC policies to pulse amplitude perturbations and decoherence variations relative to GRAPE, but the manuscript does not supply the explicit formula for RIM, the sampling procedure over the 51 variations, or tabulated comparison values. This detail is required to assess the internal comparison.

minor comments (2)

[Methods] Notation for the multi-task SAC objective and the joint optimization over T and N should be introduced with explicit equations in the methods section to improve clarity.
[Experimental setup] The manuscript should specify the parameter ranges and sampling distribution used to generate the 51 Hamiltonian variations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and will revise the manuscript to supply the requested quantitative details, definitions, and comparisons.

read point-by-point responses

Referee: [Results / generalization experiments] The section describing progressive expansion of the training Hamiltonian set states that the model is tested on Hamiltonians not encountered during training, yet reports no quantitative metrics such as mean test fidelity, standard deviation, success rate, or failure rate on the held-out set. This information is load-bearing for the central claim of generalization to unseen Hamiltonians drawn from the same space.

Authors: We agree that quantitative metrics are essential to support the generalization claim. In the revised manuscript we will report the mean test fidelity, standard deviation, success rate (defined as fidelity exceeding 0.99), and failure rate on the held-out Hamiltonians, presented in a dedicated table or figure within the results section. revision: yes
Referee: [Abstract and Experimental Results] The abstract and results claim 'high fidelities' across the 51 variations, but the manuscript provides neither the numerical fidelity values (with error bars or ranges) nor the precise definition of the fidelity measure used under noise. Without these, the support for the robustness and performance claims cannot be evaluated.

Authors: We will add the explicit definition of the fidelity (standard state fidelity F = |⟨ψ_target|ψ_final⟩|^2 evaluated under the noisy Lindblad dynamics) together with numerical values including mean fidelity and standard deviation (or ranges) across the 51 variations. These will appear in the experimental results section, and the abstract will be updated to reference the specific performance figures. revision: yes
Referee: [RIM analysis] The RIM analysis asserts superior robustness of SAC policies to pulse amplitude perturbations and decoherence variations relative to GRAPE, but the manuscript does not supply the explicit formula for RIM, the sampling procedure over the 51 variations, or tabulated comparison values. This detail is required to assess the internal comparison.

Authors: We acknowledge the need for these details. The revised manuscript will include the explicit mathematical definition of the Robustness Infidelity Measure (RIM), a description of the sampling procedure (number of noise realizations per Hamiltonian), and a table of comparative RIM values for the SAC policies versus GRAPE across the 51 variations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL application with no derivations or self-referential reductions

full rationale

The paper presents an empirical study applying the standard multi-task Soft Actor-Critic algorithm to quantum control tasks. No mathematical derivations, equations, or first-principles claims are advanced that could reduce to fitted inputs or self-citations by construction. The investigation of generalization across Hamiltonian variations is described as an experimental procedure (progressive expansion of the training set), not a derived prediction. Self-citations, if present, are not load-bearing for any central result. This matches the default expectation of no significant circularity for purely empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard reinforcement learning techniques and quantum mechanics concepts without introducing new free parameters, axioms, or entities in the abstract.

pith-pipeline@v0.9.1-grok · 5703 in / 1186 out tokens · 35508 ms · 2026-06-29T16:58:14.176050+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 33 canonical work pages · 5 internal anchors

[1]

Quantum , author =

Preskill, J. Quantum computing in the nisq era and beyond.Quantum2, 79 (2018). URL https://doi.org/10.22331/q-2018-08-06-79

work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018
[2]

A.et al.Robust quantum control in closed and open systems: Theory and practice (2024)

Weidner, C. A.et al.Robust quantum control in closed and open systems: Theory and practice (2024). URL https://arxiv.org/abs/2401.00294. arXiv:2401.00294

work page arXiv 2024
[3]

Journal of Magnetic Resonance172(2), 296–305 (2005) https: //doi.org/10.1016/j.jmr.2004.11.004

Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T. & Glaser, S. J. Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms.Journal of Magnetic Resonance172, 296–305 (2005). URL https://doi.org/10.1016/j.jmr.2004.11.004

work page doi:10.1016/j.jmr.2004.11.004 2005
[4]

& Chen, Y

Zhang, S., Miao, Z., Pan, Y., Tao, S. & Chen, Y. Meta-learning assisted robust control of universal quantum gates with uncertainties.npj Quantum Information11, 81 (2025). URL https://doi.org/10.1038/s41534-025-01034-9

work page doi:10.1038/s41534-025-01034-9 2025
[5]

URL https://dx.doi.org/10.1103/PhysRevX.8.031086

Bukov, M.et al.Reinforcement learning in different phases of quantum control.Physical Review X8(2018). URL https://dx.doi.org/10.1103/PhysRevX.8.031086. 18

work page doi:10.1103/physrevx.8.031086 2018
[6]

Y., Boixo, S., Smelyanskiy, V

Niu, M. Y., Boixo, S., Smelyanskiy, V. & Neven, H. Universal quantum control through deep reinforcement learning.npj Quantum Information5, 33 (2019). URL https://doi.org/10.1038/ s41534-019-0141-3

2019
[7]

& Wang, X

Zhang, X.-M., Wei, Z., Asad, R., Yang, X.-C. & Wang, X. When does reinforcement learn- ing stand out in quantum control? A comparative study on state preparation.npj Quantum Information5, 85 (2019). URL https://doi.org/10.1038/s41534-019-0201-8

work page doi:10.1038/s41534-019-0201-8 2019
[8]

& Marquardt, F

Bukov, M. & Marquardt, F. Reinforcement learning for quantum technology.arXiv preprint arXiv:2601.18953(2026). URL https://arxiv.org/abs/2601.18953

work page arXiv 2026
[9]

& Hartmann, M

Sarma, B. & Hartmann, M. J. Designing fast quantum gates using optimal control with a reinforcement-learning ansatz.Physical Review Applied23, 014015 (2025). URL https://doi. org/10.1103/PhysRevApplied.23.014015

work page doi:10.1103/physrevapplied.23.014015 2025
[10]

W., Campbell, S

Fentaw, H. W., Campbell, S. & Caton, S. Exploring quantum control landscape and solution space complexity through optimization algorithms and dimensionality reduction.Scientific Reports15, 14605 (2025). URL https://doi.org/10.1038/s41598-025-95161-0

work page doi:10.1038/s41598-025-95161-0 2025
[11]

M., Calarco, T., Montangero, S

Pagano, A., Müller, M. M., Calarco, T., Montangero, S. & Rembold, P. Role of bases in quantum optimal control.Phys. Rev. A110, 062608 (2024). URL https://link.aps.org/doi/10.1103/ PhysRevA.110.062608

2024
[12]

Nielsen, M. A. & Chuang, I. L.Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2010). URL https://doi.org/10.1017/ CBO9780511976667

2010
[13]

A Short Introduction to the Lindblad Master Equation.AIP Advances2020,10, 025106

Manzano, D. A short introduction to the Lindblad master equation.AIP Advances10, 025106 (2020). URL https://doi.org/10.1063/1.5115323

work page doi:10.1063/1.5115323 2020
[14]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks (2017). URL https://arxiv.org/abs/1703.03400. arXiv:1703.03400

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control

Leclerc, N., Miller, C. & Brawand, N. When does adaptation win? scaling laws for meta-learning in quantum control.arXiv preprint arXiv:2601.18973(2026). URL https://arxiv.org/pdf/2601. 18973

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

& Levine, S

Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Dy, J. & Krause, A. (eds)Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. (eds Dy, J. & Krause, A.)Proceedings of the 35th International Conference on Machine Learn- ing, Vol. 80 ofProceedings of Machine Learning Research, 1861–1870 (PMLR, 2018). URL https:/...

2018
[17]

Soft Actor-Critic Algorithms and Applications

Haarnoja, T.et al.Soft actor-critic algorithms and applications.ArXivabs/1812.05905 (2018). URL https://api.semanticscholar.org/CorpusID:55703664

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

A., Jonckheere, E

Khalid, I., Weidner, C. A., Jonckheere, E. A., Shermer, S. G. & Langbein, F. C. Statistically characterizing robustness and fidelity of quantum controls and quantum control algorithms. Phys. Rev. A107, 032606 (2023). URL https://link.aps.org/doi/10.1103/PhysRevA.107. 032606

work page doi:10.1103/physreva.107 2023
[19]

& Rabitz, H

Walmsley, I. & Rabitz, H. Quantum physics under control.Physics Today56, 43–49 (2003). URL https://doi.org/10.1063/1.1611352

work page doi:10.1063/1.1611352 2003
[20]

On the Generators of Quantum Dynamical Semigroups.Communications in Mathematical Physics 1976,48, 119–130

Lindblad, G. On the generators of quantum dynamical semigroups.Communications in Mathematical Physics48, 119–130 (1976). URL https://doi.org/10.1007/BF01608499

work page doi:10.1007/bf01608499 1976
[21]

(eds.): Dynamic Thinking: A Primer on Dynamic Field Theory

Breuer, H.-P. & Petruccione, F.The Theory of Open Quantum Systems(Oxford University Press, Oxford, UK, 2002). URL https://doi.org/10.1093/acprof:oso/9780199213900.001.0001. 19

work page doi:10.1093/acprof:oso/9780199213900.001.0001 2002
[22]

URL https://doi.org/10.1088/2058-9565/abdca6

Ball, H.et al.Software tools for quantum control: improving quantum computer performance through noise and error suppression.Quantum Science and Technology6, 044011 (2021). URL https://doi.org/10.1088/2058-9565/abdca6

work page doi:10.1088/2058-9565/abdca6 2021
[23]

URL https://www.sciencedirect.com/science/article/pii/S0370157325002704

Lambert, N.et al.Qutip 5: The quantum toolbox in Python.Physics Reports1153, 1–62 (2026). URL https://www.sciencedirect.com/science/article/pii/S0370157325002704

2026
[24]

W., Poggi, P

Duncan, C. W., Poggi, P. M., Bukov, M., Zinner, N. T. & Campbell, S. Taming quantum sys- tems: A tutorial for using shortcuts-to-adiabaticity, quantum optimal control, and reinforcement learning.PRX Quantum6, 040201 (2025). URL https://link.aps.org/doi/10.1103/j8c7-v2hd

work page doi:10.1103/j8c7-v2hd 2025
[25]

& Das Sarma, S

Barnes, E. & Das Sarma, S. Analytically solvable driven time-dependent two-level quantum systems.Physical review letters109, 060401 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevLett.109.060401

2012
[26]

del Campo, A., Rams, M. M. & Zurek, W. H. Assisted finite-rate adiabatic passage across a quantum critical point: Exact solution for the quantum ising model.Physical Review Letters 109(2012). URL http://dx.doi.org/10.1103/PhysRevLett.109.115703

work page doi:10.1103/physrevlett.109.115703 2012
[27]

L., Grace, M

Kosut, R. L., Grace, M. D. & Brif, C. Robust control of quantum gates via sequential con- vex programming.Phys. Rev. A88, 052326 (2013). URL https://link.aps.org/doi/10.1103/ PhysRevA.88.052326

2013
[28]

& Wang, Y

Lin, C., Sels, D. & Wang, Y. Time-optimal control of a dissipative qubit.Physical Review A 101, 022320 (2020). URL https://link.aps.org/doi/10.1103/PhysRevA.101.022320

work page doi:10.1103/physreva.101.022320 2020
[29]

Optimal control of quantum state preparation and entanglement creation in two-qubit quantum system with bounded amplitude.Scientific Reports13, 14734 (2023)

Li, X. Optimal control of quantum state preparation and entanglement creation in two-qubit quantum system with bounded amplitude.Scientific Reports13, 14734 (2023). URL https: //doi.org/10.1038/s41598-023-41688-z

work page doi:10.1038/s41598-023-41688-z 2023
[30]

Nature Physics21, 1489–1496 (2025)

Chen, Z.et al.Efficient implementation of arbitrary two-qubit gates using unified control. Nature Physics21, 1489–1496 (2025). URL https://doi.org/10.1038/s41567-025-02990-x

work page doi:10.1038/s41567-025-02990-x 2025
[31]

Nature Physics18, 783–788 (2022)

Kim, Y.et al.High-fidelity three-qubit itoffoli gate for fixed-frequency superconducting qubits. Nature Physics18, 783–788 (2022). URL https://doi.org/10.1038/s41567-022-01590-3

work page doi:10.1038/s41567-022-01590-3 2022
[32]

Stojanović, V. M. & Nauth, J. K. Dicke-state preparation through global transverse control of ising-coupled qubits.Physical Review A108, 012608 (2023). URL http://dx.doi.org/10.1103/ PhysRevA.108.012608

2023
[33]

& Xue, Z.-Y

Xu, P., Yang, X.-C., Mei, F. & Xue, Z.-Y. Controllable high-fidelity quantum state transfer and entanglement generation in circuit qed.Scientific Reports6, 18695 (2016). URL https: //doi.org/10.1038/srep18695

work page doi:10.1038/srep18695 2016
[34]

H., Whaley, K

Goerz, M. H., Whaley, K. B. & Koch, C. P. Hybrid optimization schemes for quantum control. EPJ Quantum Technology2(2015). URL http://dx.doi.org/10.1140/epjqt/s40507-015-0034-0

work page doi:10.1140/epjqt/s40507-015-0034-0 2015
[35]

& Chakrabarti, R

Koswara, A. & Chakrabarti, R. Robustness of controlled quantum dynamics.Physical Review A90(2014). URL http://dx.doi.org/10.1103/PhysRevA.90.043414

work page doi:10.1103/physreva.90.043414 2014
[36]

D., Dominy, J

Grace, M. D., Dominy, J. M., Witzel, W. M. & Carroll, M. S. Optimized pulses for the control of uncertain qubits.Phys. Rev. A85, 052313 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevA.85.052313

2012
[37]

M., De Chiara, G., Campbell, S

Poggi, P. M., De Chiara, G., Campbell, S. & Kiely, A. Universally robust quantum control.Phys. Rev. Lett.132, 193801 (2024). URL https://link.aps.org/doi/10.1103/PhysRevLett.132.193801

work page doi:10.1103/physrevlett.132.193801 2024
[38]

URL https://doi.org/10

Dong, D.et al.Sampling-based learning control for quantum systems with uncertainties.IEEE Transactions on Control Systems Technology23, 2155–2166 (2015). URL https://doi.org/10. 1109/TCST.2015.2404292. 20

work page arXiv 2015
[39]

P., Weidner, C

O’Neil, S. P., Weidner, C. A., Jonckheere, E. A., Langbein, F. C. & Schirmer, S. G. Robustness of dynamic quantum control: Differential sensitivity bound (2024). URL https://arxiv.org/abs/ 2401.00301. arXiv:2401.00301

work page arXiv 2024
[40]

OpenAI Gym

Brockman, G.et al.Openai gym.arXiv preprint arXiv:1606.01540(2016). URL https://doi. org/10.48550/arXiv.1606.01540

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.01540 2016
[41]

URL http://jmlr.org/papers/v22/20-1364.html

Raffin, A.et al.Stable-baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research22, 1–8 (2021). URL http://jmlr.org/papers/v22/20-1364.html

2021
[42]

& Hinton, G

Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML)(2010). Haifa, Israel

2010
[43]

& Humble, T

Dasgupta, S., Danageozian, A. & Humble, T. S. Adaptive mitigation of time-varying quantum noise (2023). URL https://arxiv.org/abs/2308.14756. arXiv:2308.14756

work page arXiv 2023
[44]

V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys

Klimov, P. V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys. Rev. Lett.121, 090502 (2018). URL https://link.aps.org/doi/10.1103/PhysRevLett.121.090502

work page doi:10.1103/physrevlett.121.090502 2018
[45]

& Zeng, G

Xiao, T., Fan, J. & Zeng, G. Parameter estimation in quantum sensing based on deep rein- forcement learning.npj Quantum Information8, 2 (2022). URL https://doi.org/10.1038/ s41534-021-00513-z. 21

2022

[1] [1]

Quantum , author =

Preskill, J. Quantum computing in the nisq era and beyond.Quantum2, 79 (2018). URL https://doi.org/10.22331/q-2018-08-06-79

work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018

[2] [2]

A.et al.Robust quantum control in closed and open systems: Theory and practice (2024)

Weidner, C. A.et al.Robust quantum control in closed and open systems: Theory and practice (2024). URL https://arxiv.org/abs/2401.00294. arXiv:2401.00294

work page arXiv 2024

[3] [3]

Journal of Magnetic Resonance172(2), 296–305 (2005) https: //doi.org/10.1016/j.jmr.2004.11.004

Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T. & Glaser, S. J. Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms.Journal of Magnetic Resonance172, 296–305 (2005). URL https://doi.org/10.1016/j.jmr.2004.11.004

work page doi:10.1016/j.jmr.2004.11.004 2005

[4] [4]

& Chen, Y

Zhang, S., Miao, Z., Pan, Y., Tao, S. & Chen, Y. Meta-learning assisted robust control of universal quantum gates with uncertainties.npj Quantum Information11, 81 (2025). URL https://doi.org/10.1038/s41534-025-01034-9

work page doi:10.1038/s41534-025-01034-9 2025

[5] [5]

URL https://dx.doi.org/10.1103/PhysRevX.8.031086

Bukov, M.et al.Reinforcement learning in different phases of quantum control.Physical Review X8(2018). URL https://dx.doi.org/10.1103/PhysRevX.8.031086. 18

work page doi:10.1103/physrevx.8.031086 2018

[6] [6]

Y., Boixo, S., Smelyanskiy, V

Niu, M. Y., Boixo, S., Smelyanskiy, V. & Neven, H. Universal quantum control through deep reinforcement learning.npj Quantum Information5, 33 (2019). URL https://doi.org/10.1038/ s41534-019-0141-3

2019

[7] [7]

& Wang, X

Zhang, X.-M., Wei, Z., Asad, R., Yang, X.-C. & Wang, X. When does reinforcement learn- ing stand out in quantum control? A comparative study on state preparation.npj Quantum Information5, 85 (2019). URL https://doi.org/10.1038/s41534-019-0201-8

work page doi:10.1038/s41534-019-0201-8 2019

[8] [8]

& Marquardt, F

Bukov, M. & Marquardt, F. Reinforcement learning for quantum technology.arXiv preprint arXiv:2601.18953(2026). URL https://arxiv.org/abs/2601.18953

work page arXiv 2026

[9] [9]

& Hartmann, M

Sarma, B. & Hartmann, M. J. Designing fast quantum gates using optimal control with a reinforcement-learning ansatz.Physical Review Applied23, 014015 (2025). URL https://doi. org/10.1103/PhysRevApplied.23.014015

work page doi:10.1103/physrevapplied.23.014015 2025

[10] [10]

W., Campbell, S

Fentaw, H. W., Campbell, S. & Caton, S. Exploring quantum control landscape and solution space complexity through optimization algorithms and dimensionality reduction.Scientific Reports15, 14605 (2025). URL https://doi.org/10.1038/s41598-025-95161-0

work page doi:10.1038/s41598-025-95161-0 2025

[11] [11]

M., Calarco, T., Montangero, S

Pagano, A., Müller, M. M., Calarco, T., Montangero, S. & Rembold, P. Role of bases in quantum optimal control.Phys. Rev. A110, 062608 (2024). URL https://link.aps.org/doi/10.1103/ PhysRevA.110.062608

2024

[12] [12]

Nielsen, M. A. & Chuang, I. L.Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2010). URL https://doi.org/10.1017/ CBO9780511976667

2010

[13] [13]

A Short Introduction to the Lindblad Master Equation.AIP Advances2020,10, 025106

Manzano, D. A short introduction to the Lindblad master equation.AIP Advances10, 025106 (2020). URL https://doi.org/10.1063/1.5115323

work page doi:10.1063/1.5115323 2020

[14] [14]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks (2017). URL https://arxiv.org/abs/1703.03400. arXiv:1703.03400

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control

Leclerc, N., Miller, C. & Brawand, N. When does adaptation win? scaling laws for meta-learning in quantum control.arXiv preprint arXiv:2601.18973(2026). URL https://arxiv.org/pdf/2601. 18973

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

& Levine, S

Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Dy, J. & Krause, A. (eds)Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. (eds Dy, J. & Krause, A.)Proceedings of the 35th International Conference on Machine Learn- ing, Vol. 80 ofProceedings of Machine Learning Research, 1861–1870 (PMLR, 2018). URL https:/...

2018

[17] [17]

Soft Actor-Critic Algorithms and Applications

Haarnoja, T.et al.Soft actor-critic algorithms and applications.ArXivabs/1812.05905 (2018). URL https://api.semanticscholar.org/CorpusID:55703664

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

A., Jonckheere, E

Khalid, I., Weidner, C. A., Jonckheere, E. A., Shermer, S. G. & Langbein, F. C. Statistically characterizing robustness and fidelity of quantum controls and quantum control algorithms. Phys. Rev. A107, 032606 (2023). URL https://link.aps.org/doi/10.1103/PhysRevA.107. 032606

work page doi:10.1103/physreva.107 2023

[19] [19]

& Rabitz, H

Walmsley, I. & Rabitz, H. Quantum physics under control.Physics Today56, 43–49 (2003). URL https://doi.org/10.1063/1.1611352

work page doi:10.1063/1.1611352 2003

[20] [20]

On the Generators of Quantum Dynamical Semigroups.Communications in Mathematical Physics 1976,48, 119–130

Lindblad, G. On the generators of quantum dynamical semigroups.Communications in Mathematical Physics48, 119–130 (1976). URL https://doi.org/10.1007/BF01608499

work page doi:10.1007/bf01608499 1976

[21] [21]

(eds.): Dynamic Thinking: A Primer on Dynamic Field Theory

Breuer, H.-P. & Petruccione, F.The Theory of Open Quantum Systems(Oxford University Press, Oxford, UK, 2002). URL https://doi.org/10.1093/acprof:oso/9780199213900.001.0001. 19

work page doi:10.1093/acprof:oso/9780199213900.001.0001 2002

[22] [22]

URL https://doi.org/10.1088/2058-9565/abdca6

Ball, H.et al.Software tools for quantum control: improving quantum computer performance through noise and error suppression.Quantum Science and Technology6, 044011 (2021). URL https://doi.org/10.1088/2058-9565/abdca6

work page doi:10.1088/2058-9565/abdca6 2021

[23] [23]

URL https://www.sciencedirect.com/science/article/pii/S0370157325002704

Lambert, N.et al.Qutip 5: The quantum toolbox in Python.Physics Reports1153, 1–62 (2026). URL https://www.sciencedirect.com/science/article/pii/S0370157325002704

2026

[24] [24]

W., Poggi, P

Duncan, C. W., Poggi, P. M., Bukov, M., Zinner, N. T. & Campbell, S. Taming quantum sys- tems: A tutorial for using shortcuts-to-adiabaticity, quantum optimal control, and reinforcement learning.PRX Quantum6, 040201 (2025). URL https://link.aps.org/doi/10.1103/j8c7-v2hd

work page doi:10.1103/j8c7-v2hd 2025

[25] [25]

& Das Sarma, S

Barnes, E. & Das Sarma, S. Analytically solvable driven time-dependent two-level quantum systems.Physical review letters109, 060401 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevLett.109.060401

2012

[26] [26]

del Campo, A., Rams, M. M. & Zurek, W. H. Assisted finite-rate adiabatic passage across a quantum critical point: Exact solution for the quantum ising model.Physical Review Letters 109(2012). URL http://dx.doi.org/10.1103/PhysRevLett.109.115703

work page doi:10.1103/physrevlett.109.115703 2012

[27] [27]

L., Grace, M

Kosut, R. L., Grace, M. D. & Brif, C. Robust control of quantum gates via sequential con- vex programming.Phys. Rev. A88, 052326 (2013). URL https://link.aps.org/doi/10.1103/ PhysRevA.88.052326

2013

[28] [28]

& Wang, Y

Lin, C., Sels, D. & Wang, Y. Time-optimal control of a dissipative qubit.Physical Review A 101, 022320 (2020). URL https://link.aps.org/doi/10.1103/PhysRevA.101.022320

work page doi:10.1103/physreva.101.022320 2020

[29] [29]

Optimal control of quantum state preparation and entanglement creation in two-qubit quantum system with bounded amplitude.Scientific Reports13, 14734 (2023)

Li, X. Optimal control of quantum state preparation and entanglement creation in two-qubit quantum system with bounded amplitude.Scientific Reports13, 14734 (2023). URL https: //doi.org/10.1038/s41598-023-41688-z

work page doi:10.1038/s41598-023-41688-z 2023

[30] [30]

Nature Physics21, 1489–1496 (2025)

Chen, Z.et al.Efficient implementation of arbitrary two-qubit gates using unified control. Nature Physics21, 1489–1496 (2025). URL https://doi.org/10.1038/s41567-025-02990-x

work page doi:10.1038/s41567-025-02990-x 2025

[31] [31]

Nature Physics18, 783–788 (2022)

Kim, Y.et al.High-fidelity three-qubit itoffoli gate for fixed-frequency superconducting qubits. Nature Physics18, 783–788 (2022). URL https://doi.org/10.1038/s41567-022-01590-3

work page doi:10.1038/s41567-022-01590-3 2022

[32] [32]

Stojanović, V. M. & Nauth, J. K. Dicke-state preparation through global transverse control of ising-coupled qubits.Physical Review A108, 012608 (2023). URL http://dx.doi.org/10.1103/ PhysRevA.108.012608

2023

[33] [33]

& Xue, Z.-Y

Xu, P., Yang, X.-C., Mei, F. & Xue, Z.-Y. Controllable high-fidelity quantum state transfer and entanglement generation in circuit qed.Scientific Reports6, 18695 (2016). URL https: //doi.org/10.1038/srep18695

work page doi:10.1038/srep18695 2016

[34] [34]

H., Whaley, K

Goerz, M. H., Whaley, K. B. & Koch, C. P. Hybrid optimization schemes for quantum control. EPJ Quantum Technology2(2015). URL http://dx.doi.org/10.1140/epjqt/s40507-015-0034-0

work page doi:10.1140/epjqt/s40507-015-0034-0 2015

[35] [35]

& Chakrabarti, R

Koswara, A. & Chakrabarti, R. Robustness of controlled quantum dynamics.Physical Review A90(2014). URL http://dx.doi.org/10.1103/PhysRevA.90.043414

work page doi:10.1103/physreva.90.043414 2014

[36] [36]

D., Dominy, J

Grace, M. D., Dominy, J. M., Witzel, W. M. & Carroll, M. S. Optimized pulses for the control of uncertain qubits.Phys. Rev. A85, 052313 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevA.85.052313

2012

[37] [37]

M., De Chiara, G., Campbell, S

Poggi, P. M., De Chiara, G., Campbell, S. & Kiely, A. Universally robust quantum control.Phys. Rev. Lett.132, 193801 (2024). URL https://link.aps.org/doi/10.1103/PhysRevLett.132.193801

work page doi:10.1103/physrevlett.132.193801 2024

[38] [38]

URL https://doi.org/10

Dong, D.et al.Sampling-based learning control for quantum systems with uncertainties.IEEE Transactions on Control Systems Technology23, 2155–2166 (2015). URL https://doi.org/10. 1109/TCST.2015.2404292. 20

work page arXiv 2015

[39] [39]

P., Weidner, C

O’Neil, S. P., Weidner, C. A., Jonckheere, E. A., Langbein, F. C. & Schirmer, S. G. Robustness of dynamic quantum control: Differential sensitivity bound (2024). URL https://arxiv.org/abs/ 2401.00301. arXiv:2401.00301

work page arXiv 2024

[40] [40]

OpenAI Gym

Brockman, G.et al.Openai gym.arXiv preprint arXiv:1606.01540(2016). URL https://doi. org/10.48550/arXiv.1606.01540

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.01540 2016

[41] [41]

URL http://jmlr.org/papers/v22/20-1364.html

Raffin, A.et al.Stable-baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research22, 1–8 (2021). URL http://jmlr.org/papers/v22/20-1364.html

2021

[42] [42]

& Hinton, G

Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML)(2010). Haifa, Israel

2010

[43] [43]

& Humble, T

Dasgupta, S., Danageozian, A. & Humble, T. S. Adaptive mitigation of time-varying quantum noise (2023). URL https://arxiv.org/abs/2308.14756. arXiv:2308.14756

work page arXiv 2023

[44] [44]

V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys

Klimov, P. V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys. Rev. Lett.121, 090502 (2018). URL https://link.aps.org/doi/10.1103/PhysRevLett.121.090502

work page doi:10.1103/physrevlett.121.090502 2018

[45] [45]

& Zeng, G

Xiao, T., Fan, J. & Zeng, G. Parameter estimation in quantum sensing based on deep rein- forcement learning.npj Quantum Information8, 2 (2022). URL https://doi.org/10.1038/ s41534-021-00513-z. 21

2022