Noise-Resilient Quantum Reinforcement Learning
Pith reviewed 2026-05-18 21:04 UTC · model grok-4.3
The pith
A bound state formed in the combined agent-noise system restores quantum reinforcement learning performance to the ideal noiseless level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By investigating the non-Markovian decoherence effect on the QRL for solving the eigenstates of the agent-environment interaction Hamiltonian, we find that the formation of a bound state in the energy spectrum of the total agent-noise system restores the QRL performance to that in the noiseless case. This supplies a universal physical mechanism to suppress the decoherence effect on quantum machine learning and offers a guideline for practical NISQ implementation.
What carries the argument
The bound state in the energy spectrum of the total agent-noise system, which directly cancels decoherence effects on the agent's learning dynamics.
If this is right
- QRL for eigenstate problems achieves optimal performance even in the presence of non-Markovian noise when the bound state forms.
- The bound-state mechanism supplies a physical route to suppress decoherence without requiring active error correction.
- The approach provides a concrete guideline for implementing quantum machine learning algorithms on current noisy hardware.
- The same protection can be sought in other sequential decision tasks by engineering the agent-noise interaction to favor bound-state formation.
Where Pith is reading between the lines
- The bound-state protection might extend to other quantum machine learning tasks such as quantum policy optimization beyond eigensolvers.
- Hardware experiments could test the mechanism by tuning decoherence parameters in superconducting or trapped-ion platforms until the bound state appears.
- Future algorithm design could deliberately choose interaction Hamiltonians that encourage protective bound states rather than avoiding noise altogether.
Load-bearing premise
The chosen non-Markovian decoherence model for the agent-environment-noise system allows a protective bound state to form that directly cancels noise effects on the learning process.
What would settle it
Measure the energy spectrum of the combined agent-noise system while running the QRL eigensolver and check whether performance returns to noiseless levels exactly when a bound state is present and drops when the bound state is absent.
Figures
read the original abstract
As a branch of quantum machine learning, quantum reinforcement learning (QRL) aims to solve complex sequential decision-making problems more efficiently and effectively than its classical counterpart by exploiting quantum resources. However, in the noisy intermediate-scale quantum (NISQ) era, its realization is challenged by the ubiquitous noise-induced decoherence. Here, we propose a noise-resilient QRL scheme for a quantum eigensolver with a two-level system as an agent. By investigating the non-Markovian decoherence effect on the QRL for solving the eigenstates of the agent-environment interaction Hamiltonian, we find that the formation of a bound state in the energy spectrum of the total agent-noise system restores the QRL performance to that in the noiseless case. Providing a universal physical mechanism to suppress the decoherence effect on quantum machine learning, our result lays the foundation for designing NISQ algorithms and offers a guideline for their practical implementation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a noise-resilient quantum reinforcement learning (QRL) scheme in which a two-level system serves as the agent to solve for eigenstates of the agent-environment interaction Hamiltonian. Through analysis of non-Markovian decoherence, the authors report that formation of a bound state in the energy spectrum of the combined agent-noise system restores QRL performance to the noiseless limit, offering a physical mechanism for decoherence suppression in quantum machine learning.
Significance. If the central claim is rigorously established, the result would be significant for NISQ-era quantum machine learning. It identifies bound-state protection as a concrete, physically motivated route to noise resilience that could inform algorithm design and implementation guidelines beyond the specific model studied.
major comments (2)
- [Main results and numerical demonstration] The central claim requires that the bound state formed in the agent-plus-noise Hamiltonian leaves the effective dynamics of the agent-environment interaction and the subsequent QRL update steps (including reward extraction) identical to the noiseless case. No analytical derivation isolating this effect on the Trotterized interaction unitary or projective measurement back-action is provided; numerical trajectories alone for one spectral density do not establish the required decoupling.
- [Model and dynamics section] Reward signals are obtained from projective measurements on the agent-environment composite. Even when a discrete bound state exists below the band edge in the agent-noise subsystem, the measurement can still couple to the continuum; the manuscript does not derive that this back-action remains decoherence-free under the bound-state condition.
minor comments (2)
- [Methods] Clarify the precise form of the non-Markovian spectral density and the Trotterization scheme used for the interaction unitary.
- [Figures] Add quantitative error analysis or statistical measures to the performance restoration plots to support the claim that performance matches the noiseless case within numerical precision.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive comments, which have helped us identify areas where additional clarification and analysis are needed. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: The central claim requires that the bound state formed in the agent-plus-noise Hamiltonian leaves the effective dynamics of the agent-environment interaction and the subsequent QRL update steps (including reward extraction) identical to the noiseless case. No analytical derivation isolating this effect on the Trotterized interaction unitary or projective measurement back-action is provided; numerical trajectories alone for one spectral density do not establish the required decoupling.
Authors: We agree that the central claim would be strengthened by an explicit analytical isolation of the bound-state effect on the effective dynamics. While the numerical trajectories for the chosen spectral density demonstrate restoration of QRL performance, this alone does not fully establish decoupling for arbitrary cases. In the revised manuscript we will add a section deriving the effective agent-environment unitary under the bound-state condition, showing that the continuum contribution vanishes in the relevant subspace for the Trotterized evolution and that the reward statistics remain unchanged. We will also include numerical results for two additional spectral densities to illustrate the generality of the protection mechanism. revision: yes
-
Referee: Reward signals are obtained from projective measurements on the agent-environment composite. Even when a discrete bound state exists below the band edge in the agent-noise subsystem, the measurement can still couple to the continuum; the manuscript does not derive that this back-action remains decoherence-free under the bound-state condition.
Authors: We acknowledge that the manuscript does not explicitly derive the measurement back-action under the bound-state condition. Our simulations show that the extracted rewards match the noiseless case, but this leaves open the question of whether the projective measurement remains protected. In the revision we will provide a derivation demonstrating that, once the bound state is formed, the agent-environment subspace is orthogonal to the noise continuum, so that the measurement operator projects only within the protected subspace and does not induce additional decoherence from the continuum modes. revision: yes
Circularity Check
No circularity: bound-state restoration derived from independent spectral analysis of agent-noise Hamiltonian.
full rationale
The manuscript derives the restoration of QRL performance from the formation of a bound state in the total agent-noise energy spectrum under a chosen non-Markovian model. This outcome is obtained by direct investigation of the Hamiltonian spectrum and its effect on the learning dynamics, without any parameter fitting to target performance metrics or redefinition of the claimed result in terms of itself. No self-citation chain, ansatz smuggling, or renaming of known results is indicated in the provided text as load-bearing for the central claim; the result is presented as an emergent physical mechanism rather than a tautological prediction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Quantum mechanics governs the total agent-plus-noise Hamiltonian and its spectrum.
- domain assumption Non-Markovian decoherence is modeled by a specific interaction that permits bound-state formation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the formation of a bound state in the energy spectrum of the total agent-noise system restores the QRL performance to that in the noiseless case
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Y(Ē) ≡ ω0 − ∫ J(ω)/(ω−Ē) dω = Ē
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
It indicates that a higher fidelity always needs a more iteration times [63]. For a given large k, Fτ exhibits a periodic oscillation with the interaction time τ and Wτ remains zero [see Fig. 1(d)]. Therefore, choosing the proper interaction time is a prerequisite for the QRL. Effect of quantum noise.— The agent-environment in- teraction in each iteration...
-
[2]
excited-state probability of the agent
We use η = 0.1, s = 1, r = 0.1, p = 1.1, and N = 1000. excited-state probability of the agent. This can be seen from the solution of Eq. (6) as ⟨+|ρ(t)|+⟩ = |x(t)|2 under the initial condition ρ(0) = |+⟩⟨+|. In the special case that the coupling between the agent and the noise is weak and the characteristic time scale of f(t − τ) is much shorter than that...
work page 2024
- [3]
- [4]
- [5]
-
[6]
L. Innocenti, L. Banchi, A. Ferraro, S. Bose, and M. Paternostro, Supervised learning of time-independent hamiltonians for gate design, New Journal of Physics 22, 065001 (2020)
work page 2020
-
[7]
Y. Liu, S. Arunachalam, and K. Temme, A rigorous and robust quantum speed-up in supervised machine learn- ing, Nature Physics 17, 1013 (2021)
work page 2021
-
[8]
V. Havl´ ıˇ cek, A. D. C´ orcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta, Super- vised learning with quantum-enhanced feature spaces, Nature 567, 209 (2019)
work page 2019
-
[9]
R.-B. Wu, H. Ding, D. Dong, and X. Wang, Learning robust and high-precision quantum controls, Phys. Rev. A 99, 042327 (2019)
work page 2019
-
[10]
M. Schuld and N. Killoran, Quantum machine learning in feature hilbert spaces, Phys. Rev. Lett. 122, 040504 (2019)
work page 2019
-
[11]
K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quan- tum circuit learning, Phys. Rev. A 98, 032309 (2018)
work page 2018
- [12]
-
[13]
A. P´ erez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre, Data re-uploading for a universal quantum classifier, Quantum 4, 226 (2020)
work page 2020
-
[14]
M. Benedetti, J. Realpe-G´ omez, R. Biswas, and A. Perdomo-Ortiz, Estimation of effective temperatures in quantum annealers for sampling applications: A case study with possible applications in deep learning, Phys. Rev. A 94, 022308 (2016)
work page 2016
-
[15]
M. Benedetti, D. Garcia-Pintos, O. Perdomo, V. Leyton- Ortega, Y. Nam, and A. Perdomo-Ortiz, A generative modeling approach for benchmarking and training shal- low quantum circuits, npj Quantum Information 5, 45 (2019)
work page 2019
-
[16]
M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Quantum boltzmann machine, Phys. Rev. X 8, 021050 (2018)
work page 2018
- [17]
- [18]
-
[19]
L.-W. Yu and D.-L. Deng, Unsupervised learning of non-hermitian topological phases, Phys. Rev. Lett. 126, 240402 (2021)
work page 2021
-
[20]
A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, and S. Severini, Learning hard quantum distributions with variational autoencoders, npj Quantum Information 4, 28 (2018)
work page 2018
-
[21]
D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum rein- forcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38, 1207 (2008)
work page 2008
- [22]
-
[23]
F. Albarr´ an-Arriagada, J. C. Retamal, E. Solano, and L. Lamata, Measurement-based adaptation protocol with quantum reinforcement learning, Phys. Rev. A 98, 042315 (2018)
work page 2018
-
[24]
G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin- Delgado, and H. J. Briegel, Quantum speedup for active learning agents, Phys. Rev. X 4, 031002 (2014)
work page 2014
-
[25]
S. Yu, F. Albarr´ an-Arriagada, J. C. Retamal, Y.-T. Wang, W. Liu, Z.-J. Ke, Y. Meng, Z.-P. Li, J.-S. Tang, E. Solano, L. Lamata, C.-F. Li, and G.-C. Guo, Recon- struction of a photonic qubit state with reinforcement learning, Advanced Quantum Technologies 2, 1800074 (2019)
work page 2019
- [26]
-
[27]
S.-F. Guo, F. Chen, Q. Liu, M. Xue, J.-J. Chen, J.-H. Cao, T.-W. Mao, M. K. Tey, and L. You, Faster state preparation across quantum phase transition assisted by reinforcement learning, Phys. Rev. Lett. 126, 060401 (2021)
work page 2021
- [28]
-
[29]
X.-M. Zhang, Z.-W. Cui, X. Wang, and M.-H. Yung, Au- tomatic spin-chain learning to explore the quantum speed limit, Phys. Rev. A 97, 052333 (2018)
work page 2018
-
[30]
J. Walln¨ ofer, A. A. Melnikov, W. D¨ ur, and H. J. Briegel, Machine learning for long-distance quantum communica- tion, PRX Quantum 1, 010301 (2020)
work page 2020
-
[31]
H. Xu, J. Li, L. Liu, Y. Wang, H. Yuan, and X. Wang, Generalizable control for quantum parameter estimation through reinforcement learning, npj Quantum Informa- tion 5, 82 (2019)
work page 2019
- [32]
-
[33]
H. Xu, T. Xiao, J. Huang, M. He, J. Fan, and G. Zeng, Toward heisenberg limit without critical slowing down via quantum reinforcement learning, Phys. Rev. Lett. 134, 120803 (2025)
work page 2025
- [34]
-
[35]
Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)
J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)
work page 2018
- [36]
-
[37]
F. Pirmoradian and K. Mølmer, Aging of a quantum bat- tery, Phys. Rev. A 100, 043833 (2019)
work page 2019
-
[38]
Barra, Dissipative charging of a quantum battery, Phys
F. Barra, Dissipative charging of a quantum battery, Phys. Rev. Lett. 122, 210601 (2019)
work page 2019
-
[39]
F. Albarelli and R. Demkowicz-Dobrza´ nski, Probe in- compatibility in multiparameter noisy quantum metrol- ogy, Phys. Rev. X 12, 011039 (2022)
work page 2022
-
[40]
W. G´ orecki, A. Riccardi, and L. Maccone, Quantum metrology of noisy spreading channels, Phys. Rev. Lett. 129, 240503 (2022)
work page 2022
-
[41]
A. Gonzalez-Tudela, D. Martin-Cano, E. Moreno, L. Martin-Moreno, C. Tejedor, and F. J. Garcia-Vidal, Entanglement of two qubits mediated by one-dimensional plasmonic waveguides, Phys. Rev. Lett. 106, 020501 (2011)
work page 2011
-
[42]
Azuma, Decoherence in grover’s quantum algorithm: Perturbative approach, Phys
H. Azuma, Decoherence in grover’s quantum algorithm: Perturbative approach, Phys. Rev. A 65, 042311 (2002)
work page 2002
-
[43]
C. Marconi, P. C. Saus, M. G. D´ ıaz, and A. Sanpera, The role of coherence theory in attractor quantum neural networks, Quantum 6, 794 (2022)
work page 2022
-
[44]
M. P. V. Stenberg, O. K¨ ohn, and F. K. Wilhelm, Char- acterization of decohering quantum systems: Machine learning approach, Phys. Rev. A 93, 012122 (2016)
work page 2016
-
[45]
S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, Noise-induced barren plateaus in variational quantum algorithms, Nature Communica- tions 12, 6961 (2021)
work page 2021
-
[46]
H. Chen and D. A. Lidar, Why and when pausing is bene- ficial in quantum annealing, Phys. Rev. Appl. 14, 014100 (2020)
work page 2020
-
[47]
T. Albash and D. A. Lidar, Decoherence in adiabatic quantum computation, Phys. Rev. A 91, 062320 (2015)
work page 2015
-
[48]
D. J. Egger, C. Capecci, B. Pokharel, P. K. Barkoutsos, L. E. Fischer, L. Guidoni, and I. Tavernelli, Pulse vari- ational quantum eigensolver on cross-resonance-based hardware, Phys. Rev. Res. 5, 033159 (2023)
work page 2023
-
[49]
M. L. Olivera-Atencio, L. Lamata, M. Morillo, and J. Casado-Pascual, Quantum reinforcement learning in the presence of thermal dissipation, Phys. Rev. E 108, 014128 (2023)
work page 2023
-
[50]
Sun, Decoherence in grover search algorithm, Quan- tum Information Processing 23, 183 (2024)
Y. Sun, Decoherence in grover search algorithm, Quan- tum Information Processing 23, 183 (2024)
work page 2024
-
[51]
Q. Deng, D. V. Averin, M. H. Amin, and P. Smith, Deco- herence induced deformation of the ground state in adi- abatic quantum computation, Scientific Reports 3, 1479 (2013)
work page 2013
-
[52]
A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nature Communications 5, 4213 (2014)
work page 2014
-
[53]
T. van der Sar, Z. H. Wang, M. S. Blok, H. Bernien, T. H. Taminiau, D. M. Toyli, D. A. Lidar, D. D. Awschalom, R. Hanson, and V. V. Dobrovitski, Decoherence- protected quantum gates for a hybrid solid-state spin register, Nature 484, 82 (2012)
work page 2012
- [54]
-
[55]
R. Sagastizabal, X. Bonet-Monroig, M. Singh, M. A. 7 Rol, C. C. Bultink, X. Fu, C. H. Price, V. P. Os- troukh, N. Muthusubramanian, A. Bruno, M. Beekman, N. Haider, T. E. O’Brien, and L. DiCarlo, Experimen- tal error mitigation via symmetry verification in a vari- ational quantum eigensolver, Phys. Rev. A 100, 010302 (2019)
work page 2019
-
[56]
J. R. McClean, M. E. Kimchi-Schwartz, J. Carter, and W. A. de Jong, Hybrid quantum-classical hierarchy for mitigation of decoherence and determination of excited states, Phys. Rev. A 95, 042308 (2017)
work page 2017
-
[57]
P. Botsinis, Z. Babar, D. Alanis, D. Chandra, H. Nguyen, S. X. Ng, and L. Hanzo, Quantum error correction pro- tects quantum search algorithms against decoherence, Scientific Reports 6, 38095 (2016)
work page 2016
-
[58]
H. Liao, I. Convy, Z. Yang, and K. B. Whaley, Decoher- ing tensor network quantum machine learning models, Quantum Machine Intelligence 5, 7 (2023)
work page 2023
-
[59]
F. Hu, S. A. Khan, N. T. Bronn, G. Angelatos, G. E. Rowlands, G. J. Ribeill, and H. E. T¨ ureci, Overcoming the coherence time barrier in quantum machine learn- ing on temporal data, Nature Communications 15, 7491 (2024)
work page 2024
-
[60]
N. H. Nguyen, E. C. Behrman, and J. E. Steck, Quan- tum learning with noise and decoherence: a robust quan- tum neural network, Quantum Machine Intelligence 2, 1 (2020)
work page 2020
-
[61]
H.-P. Breuer, E.-M. Laine, J. Piilo, and B. Vacchini, Col- loquium: Non-markovian dynamics in open quantum sys- tems, Rev. Mod. Phys. 88, 021002 (2016)
work page 2016
-
[62]
P. A. Camati, J. F. G. Santos, and R. M. Serra, Employ- ing non-markovian effects to improve the performance of a quantum otto refrigerator, Phys. Rev. A 102, 012217 (2020)
work page 2020
-
[63]
I. A. Luchnikov, S. V. Vintskevich, D. A. Grigoriev, and S. N. Filippov, Machine learning non-markovian quan- tum dynamics, Phys. Rev. Lett. 124, 140502 (2020)
work page 2020
-
[64]
K. Bai, Z. Peng, H.-G. Luo, and J.-H. An, Retrieving ideal precision in noisy quantum optical metrology, Phys. Rev. Lett. 123, 040402 (2019)
work page 2019
-
[65]
F. Albarr´ an-Arriagada, J. C. Retamal, E. Solano, and L. Lamata, Reinforcement learning for semi-autonomous approximate quantum eigensolver, Machine Learning: Science and Technology 1, 015002 (2020)
work page 2020
-
[66]
S. I. Bogdanov, A. Boltasseva, and V. M. Shalaev, Over- coming quantum decoherence with plasmonics, Science 364, 532 (2019)
work page 2019
-
[67]
L. Maccone and V. Giovannetti, Beauty and the noisy beast, Nature Physics 7, 376 (2011)
work page 2011
-
[68]
A. J. Leggett, S. Chakravarty, A. T. Dorsey, M. P. A. Fisher, A. Garg, and W. Zwerger, Dynamics of the dissi- pative two-state system, Rev. Mod. Phys. 59, 1 (1987)
work page 1987
-
[69]
C.-J. Yang, J.-H. An, H.-G. Luo, Y. Li, and C. H. Oh, Canonical versus noncanonical equilibration dynamics of open quantum systems, Phys. Rev. E 90, 022122 (2014)
work page 2014
- [70]
- [71]
-
[72]
N.-H. Tong and M. Vojta, Signatures of a noise-induced quantum phase transition in a mesoscopic metal ring, Phys. Rev. Lett. 97, 016802 (2006)
work page 2006
- [73]
-
[74]
W.-M. Zhang, P.-Y. Lo, H.-N. Xiong, M. W.-Y. Tu, and F. Nori, General non-markovian dynamics of open quan- tum systems, Phys. Rev. Lett. 109, 170402 (2012)
work page 2012
-
[75]
W.-L. Song, H.-B. Liu, B. Zhou, W.-L. Yang, and J.-H. An, Remote charging and degradation suppression for the quantum battery, Phys. Rev. Lett. 132, 090401 (2024)
work page 2024
-
[76]
S.-Y. Bai and J.-H. An, Floquet engineering to overcome no-go theorem of noisy quantum metrology, Phys. Rev. Lett. 131, 050801 (2023)
work page 2023
- [77]
-
[78]
J. Kwon, Y. Kim, A. Lanuza, and D. Schneble, Formation of matter-wave polaritons in an optical lattice, Nature Physics 18, 657 (2022)
work page 2022
-
[79]
L. Krinner, M. Stewart, A. Pazmi˜ no, J. Kwon, and D. Schneble, Spontaneous emission of matter waves from a tunable open quantum system, Nature 559, 589 (2018)
work page 2018
-
[80]
C. Y. Pan, M. Hao, N. Barraza, E. Solano, and F. Al- barr´ an-Arriagada, Experimental semi-autonomous eigen- solver using reinforcement learning, Scientific Reports 11, 12241 (2021)
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.