Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization
Pith reviewed 2026-06-29 16:58 UTC · model grok-4.3
The pith
A multi-task reinforcement learning model generates control pulses that achieve high-fidelity state transfer under noise across many different Hamiltonians.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A single multi-task SAC model trained on a collection of Hamiltonians can produce pulse sequences that transfer quantum states under environment noise with high fidelities, simultaneously identifying problem-specific evolution time T and segment count N, and these policies succeed on unseen Hamiltonians drawn from the same space while demonstrating greater robustness to amplitude and decoherence variations than GRAPE controls.
What carries the argument
The multi-task Soft Actor-Critic reinforcement learning agent that jointly optimizes control pulses, evolution time T, and number of segments N across multiple Hamiltonian tasks.
If this is right
- One model trained on multiple Hamiltonians succeeds on state-transfer tasks for Hamiltonians not encountered in training.
- The framework automatically selects suitable evolution times and pulse segment counts for each individual problem.
- SAC-derived policies exhibit superior robustness to pulse amplitude perturbations and decoherence rate variations compared with GRAPE controls.
- The results establish a route toward control methods that apply across realistic noisy quantum devices without per-instance redesign.
Where Pith is reading between the lines
- If generalization to unseen Hamiltonians holds, control strategies could adapt to new device parameters without full retraining.
- The joint optimization of time and segments might extend to other quantum tasks such as gate synthesis.
- Adding more varied noise models during training could increase the chance that simulation results survive on physical hardware.
Load-bearing premise
Training on a finite set of 51 Hamiltonian variations is sufficient to enable successful state transfer for unseen Hamiltonians from the same space, and that simulation robustness translates to real devices.
What would settle it
Testing the trained model on a real quantum device using a Hamiltonian variation outside the training set and checking whether the measured fidelity remains high or drops sharply.
read the original abstract
We present a Multi-task Soft Actor-Critic (SAC) Reinforcement Learning framework designed for open-system quantum control across diverse Hamiltonians, which learns optimal pulse sequences while simultaneously discovering problem-specific evolution time T and number of control pulse segments N. Experimental results across 51 Hamiltonian variations demonstrate that the multi-task SAC model is able to generate control pulses that can drive a system, under environment noise, from its initial state to its target state with high fidelities, establishing essential foundations for universal quantum control applicable to realistic noisy quantum devices. Through progressive expansion of the training Hamiltonian set, we investigate if a single multi-task model trained using a given number of sample Hamiltonians can successfully accomplish state-transfer tasks for Hamiltonians drawn from the same Hamiltonian space but not encountered during training. In addition, our Robustness Infidelity Measure (RIM) analysis reveals that SAC trained policies exhibit superior robustness to pulse amplitude perturbations and decoherence rate variations compared to GRAPE-optimized controls.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a multi-task Soft Actor-Critic (SAC) reinforcement learning framework for open quantum system control that simultaneously optimizes control pulse sequences, evolution time T, and number of segments N across diverse Hamiltonians. Experiments on 51 Hamiltonian variations under environment noise report high-fidelity state transfers and introduce a Robustness Infidelity Measure (RIM) showing superior performance to GRAPE; the work also examines whether progressive expansion of the training set enables generalization to unseen Hamiltonians from the same space.
Significance. If the generalization results hold with supporting metrics, the multi-task SAC approach could provide a practical route to adaptive, robust control for noisy quantum devices, moving beyond single-Hamiltonian optimization methods. The joint learning of T and N alongside pulses, together with the RIM comparison, represents a concrete empirical contribution to RL-based quantum control.
major comments (3)
- [Results / generalization experiments] The section describing progressive expansion of the training Hamiltonian set states that the model is tested on Hamiltonians not encountered during training, yet reports no quantitative metrics such as mean test fidelity, standard deviation, success rate, or failure rate on the held-out set. This information is load-bearing for the central claim of generalization to unseen Hamiltonians drawn from the same space.
- [Abstract and Experimental Results] The abstract and results claim 'high fidelities' across the 51 variations, but the manuscript provides neither the numerical fidelity values (with error bars or ranges) nor the precise definition of the fidelity measure used under noise. Without these, the support for the robustness and performance claims cannot be evaluated.
- [RIM analysis] The RIM analysis asserts superior robustness of SAC policies to pulse amplitude perturbations and decoherence variations relative to GRAPE, but the manuscript does not supply the explicit formula for RIM, the sampling procedure over the 51 variations, or tabulated comparison values. This detail is required to assess the internal comparison.
minor comments (2)
- [Methods] Notation for the multi-task SAC objective and the joint optimization over T and N should be introduced with explicit equations in the methods section to improve clarity.
- [Experimental setup] The manuscript should specify the parameter ranges and sampling distribution used to generate the 51 Hamiltonian variations.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below and will revise the manuscript to supply the requested quantitative details, definitions, and comparisons.
read point-by-point responses
-
Referee: [Results / generalization experiments] The section describing progressive expansion of the training Hamiltonian set states that the model is tested on Hamiltonians not encountered during training, yet reports no quantitative metrics such as mean test fidelity, standard deviation, success rate, or failure rate on the held-out set. This information is load-bearing for the central claim of generalization to unseen Hamiltonians drawn from the same space.
Authors: We agree that quantitative metrics are essential to support the generalization claim. In the revised manuscript we will report the mean test fidelity, standard deviation, success rate (defined as fidelity exceeding 0.99), and failure rate on the held-out Hamiltonians, presented in a dedicated table or figure within the results section. revision: yes
-
Referee: [Abstract and Experimental Results] The abstract and results claim 'high fidelities' across the 51 variations, but the manuscript provides neither the numerical fidelity values (with error bars or ranges) nor the precise definition of the fidelity measure used under noise. Without these, the support for the robustness and performance claims cannot be evaluated.
Authors: We will add the explicit definition of the fidelity (standard state fidelity F = |⟨ψ_target|ψ_final⟩|^2 evaluated under the noisy Lindblad dynamics) together with numerical values including mean fidelity and standard deviation (or ranges) across the 51 variations. These will appear in the experimental results section, and the abstract will be updated to reference the specific performance figures. revision: yes
-
Referee: [RIM analysis] The RIM analysis asserts superior robustness of SAC policies to pulse amplitude perturbations and decoherence variations relative to GRAPE, but the manuscript does not supply the explicit formula for RIM, the sampling procedure over the 51 variations, or tabulated comparison values. This detail is required to assess the internal comparison.
Authors: We acknowledge the need for these details. The revised manuscript will include the explicit mathematical definition of the Robustness Infidelity Measure (RIM), a description of the sampling procedure (number of noise realizations per Hamiltonian), and a table of comparative RIM values for the SAC policies versus GRAPE across the 51 variations. revision: yes
Circularity Check
No circularity: empirical RL application with no derivations or self-referential reductions
full rationale
The paper presents an empirical study applying the standard multi-task Soft Actor-Critic algorithm to quantum control tasks. No mathematical derivations, equations, or first-principles claims are advanced that could reduce to fitted inputs or self-citations by construction. The investigation of generalization across Hamiltonian variations is described as an experimental procedure (progressive expansion of the training set), not a derived prediction. Self-citations, if present, are not load-bearing for any central result. This matches the default expectation of no significant circularity for purely empirical work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Preskill, J. Quantum computing in the nisq era and beyond.Quantum2, 79 (2018). URL https://doi.org/10.22331/q-2018-08-06-79
work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018
-
[2]
A.et al.Robust quantum control in closed and open systems: Theory and practice (2024)
Weidner, C. A.et al.Robust quantum control in closed and open systems: Theory and practice (2024). URL https://arxiv.org/abs/2401.00294. arXiv:2401.00294
-
[3]
Journal of Magnetic Resonance172(2), 296–305 (2005) https: //doi.org/10.1016/j.jmr.2004.11.004
Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T. & Glaser, S. J. Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms.Journal of Magnetic Resonance172, 296–305 (2005). URL https://doi.org/10.1016/j.jmr.2004.11.004
-
[4]
Zhang, S., Miao, Z., Pan, Y., Tao, S. & Chen, Y. Meta-learning assisted robust control of universal quantum gates with uncertainties.npj Quantum Information11, 81 (2025). URL https://doi.org/10.1038/s41534-025-01034-9
-
[5]
URL https://dx.doi.org/10.1103/PhysRevX.8.031086
Bukov, M.et al.Reinforcement learning in different phases of quantum control.Physical Review X8(2018). URL https://dx.doi.org/10.1103/PhysRevX.8.031086. 18
-
[6]
Y., Boixo, S., Smelyanskiy, V
Niu, M. Y., Boixo, S., Smelyanskiy, V. & Neven, H. Universal quantum control through deep reinforcement learning.npj Quantum Information5, 33 (2019). URL https://doi.org/10.1038/ s41534-019-0141-3
2019
-
[7]
Zhang, X.-M., Wei, Z., Asad, R., Yang, X.-C. & Wang, X. When does reinforcement learn- ing stand out in quantum control? A comparative study on state preparation.npj Quantum Information5, 85 (2019). URL https://doi.org/10.1038/s41534-019-0201-8
-
[8]
Bukov, M. & Marquardt, F. Reinforcement learning for quantum technology.arXiv preprint arXiv:2601.18953(2026). URL https://arxiv.org/abs/2601.18953
-
[9]
Sarma, B. & Hartmann, M. J. Designing fast quantum gates using optimal control with a reinforcement-learning ansatz.Physical Review Applied23, 014015 (2025). URL https://doi. org/10.1103/PhysRevApplied.23.014015
-
[10]
Fentaw, H. W., Campbell, S. & Caton, S. Exploring quantum control landscape and solution space complexity through optimization algorithms and dimensionality reduction.Scientific Reports15, 14605 (2025). URL https://doi.org/10.1038/s41598-025-95161-0
-
[11]
M., Calarco, T., Montangero, S
Pagano, A., Müller, M. M., Calarco, T., Montangero, S. & Rembold, P. Role of bases in quantum optimal control.Phys. Rev. A110, 062608 (2024). URL https://link.aps.org/doi/10.1103/ PhysRevA.110.062608
2024
-
[12]
Nielsen, M. A. & Chuang, I. L.Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2010). URL https://doi.org/10.1017/ CBO9780511976667
2010
-
[13]
A Short Introduction to the Lindblad Master Equation.AIP Advances2020,10, 025106
Manzano, D. A short introduction to the Lindblad master equation.AIP Advances10, 025106 (2020). URL https://doi.org/10.1063/1.5115323
-
[14]
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks (2017). URL https://arxiv.org/abs/1703.03400. arXiv:1703.03400
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
Leclerc, N., Miller, C. & Brawand, N. When does adaptation win? scaling laws for meta-learning in quantum control.arXiv preprint arXiv:2601.18973(2026). URL https://arxiv.org/pdf/2601. 18973
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
& Levine, S
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Dy, J. & Krause, A. (eds)Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. (eds Dy, J. & Krause, A.)Proceedings of the 35th International Conference on Machine Learn- ing, Vol. 80 ofProceedings of Machine Learning Research, 1861–1870 (PMLR, 2018). URL https:/...
2018
-
[17]
Soft Actor-Critic Algorithms and Applications
Haarnoja, T.et al.Soft actor-critic algorithms and applications.ArXivabs/1812.05905 (2018). URL https://api.semanticscholar.org/CorpusID:55703664
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Khalid, I., Weidner, C. A., Jonckheere, E. A., Shermer, S. G. & Langbein, F. C. Statistically characterizing robustness and fidelity of quantum controls and quantum control algorithms. Phys. Rev. A107, 032606 (2023). URL https://link.aps.org/doi/10.1103/PhysRevA.107. 032606
-
[19]
Walmsley, I. & Rabitz, H. Quantum physics under control.Physics Today56, 43–49 (2003). URL https://doi.org/10.1063/1.1611352
-
[20]
Lindblad, G. On the generators of quantum dynamical semigroups.Communications in Mathematical Physics48, 119–130 (1976). URL https://doi.org/10.1007/BF01608499
-
[21]
(eds.): Dynamic Thinking: A Primer on Dynamic Field Theory
Breuer, H.-P. & Petruccione, F.The Theory of Open Quantum Systems(Oxford University Press, Oxford, UK, 2002). URL https://doi.org/10.1093/acprof:oso/9780199213900.001.0001. 19
work page doi:10.1093/acprof:oso/9780199213900.001.0001 2002
-
[22]
URL https://doi.org/10.1088/2058-9565/abdca6
Ball, H.et al.Software tools for quantum control: improving quantum computer performance through noise and error suppression.Quantum Science and Technology6, 044011 (2021). URL https://doi.org/10.1088/2058-9565/abdca6
-
[23]
URL https://www.sciencedirect.com/science/article/pii/S0370157325002704
Lambert, N.et al.Qutip 5: The quantum toolbox in Python.Physics Reports1153, 1–62 (2026). URL https://www.sciencedirect.com/science/article/pii/S0370157325002704
2026
-
[24]
Duncan, C. W., Poggi, P. M., Bukov, M., Zinner, N. T. & Campbell, S. Taming quantum sys- tems: A tutorial for using shortcuts-to-adiabaticity, quantum optimal control, and reinforcement learning.PRX Quantum6, 040201 (2025). URL https://link.aps.org/doi/10.1103/j8c7-v2hd
-
[25]
& Das Sarma, S
Barnes, E. & Das Sarma, S. Analytically solvable driven time-dependent two-level quantum systems.Physical review letters109, 060401 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevLett.109.060401
2012
-
[26]
del Campo, A., Rams, M. M. & Zurek, W. H. Assisted finite-rate adiabatic passage across a quantum critical point: Exact solution for the quantum ising model.Physical Review Letters 109(2012). URL http://dx.doi.org/10.1103/PhysRevLett.109.115703
-
[27]
L., Grace, M
Kosut, R. L., Grace, M. D. & Brif, C. Robust control of quantum gates via sequential con- vex programming.Phys. Rev. A88, 052326 (2013). URL https://link.aps.org/doi/10.1103/ PhysRevA.88.052326
2013
-
[28]
Lin, C., Sels, D. & Wang, Y. Time-optimal control of a dissipative qubit.Physical Review A 101, 022320 (2020). URL https://link.aps.org/doi/10.1103/PhysRevA.101.022320
-
[29]
Li, X. Optimal control of quantum state preparation and entanglement creation in two-qubit quantum system with bounded amplitude.Scientific Reports13, 14734 (2023). URL https: //doi.org/10.1038/s41598-023-41688-z
-
[30]
Nature Physics21, 1489–1496 (2025)
Chen, Z.et al.Efficient implementation of arbitrary two-qubit gates using unified control. Nature Physics21, 1489–1496 (2025). URL https://doi.org/10.1038/s41567-025-02990-x
-
[31]
Nature Physics18, 783–788 (2022)
Kim, Y.et al.High-fidelity three-qubit itoffoli gate for fixed-frequency superconducting qubits. Nature Physics18, 783–788 (2022). URL https://doi.org/10.1038/s41567-022-01590-3
-
[32]
Stojanović, V. M. & Nauth, J. K. Dicke-state preparation through global transverse control of ising-coupled qubits.Physical Review A108, 012608 (2023). URL http://dx.doi.org/10.1103/ PhysRevA.108.012608
2023
-
[33]
Xu, P., Yang, X.-C., Mei, F. & Xue, Z.-Y. Controllable high-fidelity quantum state transfer and entanglement generation in circuit qed.Scientific Reports6, 18695 (2016). URL https: //doi.org/10.1038/srep18695
-
[34]
Goerz, M. H., Whaley, K. B. & Koch, C. P. Hybrid optimization schemes for quantum control. EPJ Quantum Technology2(2015). URL http://dx.doi.org/10.1140/epjqt/s40507-015-0034-0
-
[35]
Koswara, A. & Chakrabarti, R. Robustness of controlled quantum dynamics.Physical Review A90(2014). URL http://dx.doi.org/10.1103/PhysRevA.90.043414
-
[36]
D., Dominy, J
Grace, M. D., Dominy, J. M., Witzel, W. M. & Carroll, M. S. Optimized pulses for the control of uncertain qubits.Phys. Rev. A85, 052313 (2012). URL https://link.aps.org/doi/10.1103/ PhysRevA.85.052313
2012
-
[37]
M., De Chiara, G., Campbell, S
Poggi, P. M., De Chiara, G., Campbell, S. & Kiely, A. Universally robust quantum control.Phys. Rev. Lett.132, 193801 (2024). URL https://link.aps.org/doi/10.1103/PhysRevLett.132.193801
-
[38]
Dong, D.et al.Sampling-based learning control for quantum systems with uncertainties.IEEE Transactions on Control Systems Technology23, 2155–2166 (2015). URL https://doi.org/10. 1109/TCST.2015.2404292. 20
-
[39]
O’Neil, S. P., Weidner, C. A., Jonckheere, E. A., Langbein, F. C. & Schirmer, S. G. Robustness of dynamic quantum control: Differential sensitivity bound (2024). URL https://arxiv.org/abs/ 2401.00301. arXiv:2401.00301
-
[40]
Brockman, G.et al.Openai gym.arXiv preprint arXiv:1606.01540(2016). URL https://doi. org/10.48550/arXiv.1606.01540
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.01540 2016
-
[41]
URL http://jmlr.org/papers/v22/20-1364.html
Raffin, A.et al.Stable-baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research22, 1–8 (2021). URL http://jmlr.org/papers/v22/20-1364.html
2021
-
[42]
& Hinton, G
Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML)(2010). Haifa, Israel
2010
-
[43]
Dasgupta, S., Danageozian, A. & Humble, T. S. Adaptive mitigation of time-varying quantum noise (2023). URL https://arxiv.org/abs/2308.14756. arXiv:2308.14756
-
[44]
V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys
Klimov, P. V.et al.Fluctuations of energy-relaxation times in superconducting qubits.Phys. Rev. Lett.121, 090502 (2018). URL https://link.aps.org/doi/10.1103/PhysRevLett.121.090502
-
[45]
& Zeng, G
Xiao, T., Fan, J. & Zeng, G. Parameter estimation in quantum sensing based on deep rein- forcement learning.npj Quantum Information8, 2 (2022). URL https://doi.org/10.1038/ s41534-021-00513-z. 21
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.