Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search

Masaaki Kondo; Shaswot Shresthamali; Thet Htar Su

arxiv: 2412.18208 · v3 · submitted 2024-12-24 · 🪐 quant-ph · cs.LG

Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search

Thet Htar Su , Shaswot Shresthamali , Masaaki Kondo This is my paper

Pith reviewed 2026-05-23 06:34 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords quantum reinforcement learningMarkov decision processquantum arithmetictrajectory searchquantum superpositionfully quantum RL

0 comments

The pith

A fully quantum Markov decision process model enables reinforcement learning entirely within the quantum domain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a quantum framework for reinforcement learning that creates a fully quantum version of the Markov decision process. It implements state transitions, return calculations, and trajectory searches using quantum arithmetic and a quantum search algorithm. The central aim is to perform all agent-environment interactions inside the quantum domain with no classical computations required. A sympathetic reader would care because this setup leverages superposition to potentially improve efficiency in RL decision-making tasks.

Core claim

The paper establishes that by modeling the MDP with quantum states and using quantum principles for transitions, rewards, and searching trajectories, the entire RL process can be realized through quantum phenomena without classical intervention, demonstrating quantum enhancement via superposition.

What carries the argument

Quantum model of the Markov decision process that uses quantum arithmetic for state transitions and return calculations together with quantum search for trajectory optimization.

Load-bearing premise

A quantum model of the MDP can be realized with quantum arithmetic and search such that the full RL loop runs without any classical post-processing or measurement that would collapse the claimed advantage.

What would settle it

An implementation of a simple RL task on quantum hardware that requires intermediate measurements to extract actions or rewards, showing that the loop cannot complete without classical steps.

Figures

Figures reproduced from arXiv: 2412.18208 by Masaaki Kondo, Shaswot Shresthamali, Thet Htar Su.

**Figure 2.** Figure 2: FIG. 2. Quantum circuit for Grover’s algorithm on 2 qubits, search [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Graphical representation of a classical MDP with four states [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Quantum circuit of the quantum Markov decision process (QMDP) simulating a single interaction between the agent and the environ [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. State transition heat-map representing the probabilities of [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Quantum sample distribution of the QMDP circuit, display [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Quantum circuit implementation of agent-environment interactions across 3 time steps (t = 0, 1, 2). Each colored block represents a [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Quantum circuit for return calculation in the QMDP. The process simulates the overall outcome of the agent-environment interactions [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Distribution of quantum trajectories in the QMDP for 3 time steps. The x-axis shows trajectory numbers (see Table [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. Distribution of quantum trajectories after executing Grover’s algorithm to search the trajectories starting at [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 12.** Figure 12: FIG. 12. Optimal trajectory plot for an agent transitioning from [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. Comparison of total rewards for 4 unique trajectories from [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 13.** Figure 13: FIG. 13. Distribution of quantum trajectories after executing Grover’s algorithm to search the trajectories starting from any state and termi [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 14.** Figure 14: FIG. 14. Comparison of total reward for each unique trajectory [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: FIG. 15. Optimal trajectories from 3 different starting states (1, 2, [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

read the original abstract

This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov decision process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domain, eliminating reliance on classical computations. Key contributions include the quantum-based state transitions, return calculation, and trajectory search mechanism that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena. The implementation emphasizes the fundamental role of quantum superposition in enhancing computational efficiency for RL tasks. Results demonstrate the capacity of a quantum model to achieve quantum enhancement in RL, highlighting the potential of fully quantum implementations in decision-making tasks. This work not only underscores the applicability of quantum computing in machine learning but also contributes to the field of quantum reinforcement learning (QRL) by offering a robust framework for understanding and exploiting quantum computing in RL systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a fully quantum RL loop but the MDP encoding step is unavoidably classical, so the central advantage is not shown.

read the letter

The main thing here is that the paper asserts an end-to-end quantum RL system with no classical computations remaining in the loop, yet the construction they describe cannot actually achieve that. They outline quantum arithmetic for state transitions and returns plus a quantum search step for trajectories, all meant to run coherently inside superposition. That direction is worth thinking about in quantum RL, but the execution does not close the loop as claimed. The abstract and the high-level description supply no explicit unitary for an arbitrary transition kernel or reward function. Building such a unitary from a classical MDP definition is classical pre-processing by nature; you have to compute or approximate the operator before you can apply it. The search oracle would then mark states based on the same classical data. Nothing in the text shows a parameter-free or self-contained way around this step. There are also no circuit diagrams, no derivation of how policy extraction avoids measurement collapse, and no verification that the claimed efficiency gain survives the encoding cost. The results section simply states that quantum enhancement is demonstrated without showing the data or the comparison. This leaves the work at the level of a framework sketch rather than a worked-out method. Readers already following quantum machine learning might find the high-level integration of arithmetic and search useful as one more data point, but the absence of concrete constructions means the paper does not contain enough technical substance for a serious referee. I would not send it out for peer review.

Referee Report

2 major / 0 minor

Summary. The paper proposes a quantum framework for reinforcement learning that models the classical Markov decision process (MDP) using quantum principles. It claims to implement agent-environment interactions, state transitions, return calculations, and trajectory search entirely within the quantum domain via quantum arithmetic and a quantum search algorithm, thereby eliminating classical computations and achieving quantum enhancement through superposition.

Significance. If the central claim of a self-contained quantum MDP realization (with unitary encodings of transitions and rewards, coherent trajectory search, and policy extraction without measurement-induced collapse or classical post-processing) holds, the work would constitute a notable advance in quantum reinforcement learning by providing an end-to-end quantum RL loop. The emphasis on superposition for efficiency and the avoidance of hybrid classical-quantum interfaces would be a distinguishing contribution if demonstrated.

major comments (2)

[Abstract] Abstract: The assertion that 'the implementation and optimization of the agent-environment interactions [occur] entirely within the quantum domain, eliminating reliance on classical computations' is load-bearing for the central claim but is unsupported; no unitary construction, circuit, or encoding procedure is supplied showing how an arbitrary classical transition kernel P(s'|s,a) and reward function R(s,a) are embedded into quantum arithmetic operations without classical pre-processing to define the oracle or unitary.
[Abstract] Abstract: The statement that 'results demonstrate the capacity of a quantum model to achieve quantum enhancement' lacks any supporting data, benchmark comparisons, circuit diagrams, or verification steps, rendering the enhancement claim unverifiable and preventing assessment of whether the trajectory search (presumably amplitude amplification) preserves coherence across the full RL loop.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and for identifying key points where the manuscript's claims require stronger substantiation. We agree that the abstract assertions about a fully quantum implementation and demonstrated enhancement need explicit support. We will revise the manuscript to address these gaps by adding the requested constructions, circuits, and verification details.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'the implementation and optimization of the agent-environment interactions [occur] entirely within the quantum domain, eliminating reliance on classical computations' is load-bearing for the central claim but is unsupported; no unitary construction, circuit, or encoding procedure is supplied showing how an arbitrary classical transition kernel P(s'|s,a) and reward function R(s,a) are embedded into quantum arithmetic operations without classical pre-processing to define the oracle or unitary.

Authors: We acknowledge that the current manuscript describes the quantum MDP at a conceptual level using quantum arithmetic for transitions and rewards but does not supply explicit unitary operators or circuits for arbitrary classical kernels P(s'|s,a) and R(s,a). This is a valid observation. In the revision we will add a dedicated section with the encoding procedure, including the unitary construction that embeds the transition kernel via quantum arithmetic without classical pre-processing of the oracle, and circuit diagrams showing how the agent-environment interaction remains coherent. revision: yes
Referee: [Abstract] Abstract: The statement that 'results demonstrate the capacity of a quantum model to achieve quantum enhancement' lacks any supporting data, benchmark comparisons, circuit diagrams, or verification steps, rendering the enhancement claim unverifiable and preventing assessment of whether the trajectory search (presumably amplitude amplification) preserves coherence across the full RL loop.

Authors: The manuscript argues for quantum enhancement via superposition in the trajectory search step but indeed provides no numerical benchmarks, classical comparisons, or explicit circuit simulations to verify coherence preservation through the full loop. We agree this renders the claim unverifiable in its present form. The revision will incorporate simulation results on small MDPs, runtime comparisons against classical RL, circuit diagrams for the amplitude amplification step, and analysis confirming that measurements do not collapse the superposition before policy extraction. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained.

full rationale

The abstract and available description introduce a quantum MDP model via quantum arithmetic and trajectory search but supply no equations, parameter fits, or self-citations that reduce any claimed prediction or result to its own inputs by construction. No load-bearing step matches the enumerated patterns (self-definitional, fitted-input-called-prediction, etc.). The central claim of a fully quantum RL loop is presented at a high level without demonstrated reduction to classical pre-processing or renamed empirical patterns. This is the expected honest non-finding when the manuscript does not exhibit the specific reductions required for a positive circularity flag.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5701 in / 1048 out tokens · 18026 ms · 2026-05-23T06:34:32.824767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

[1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An In- troduction (The MIT Press, Cambridge, 2018)

work page 2018
[2]

Graesser and W

L. Graesser and W. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley, USA, 2020)

work page 2020
[3]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

work page 2016
[4]

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

S. Shalev-Shwartz, S. Shammah, and A. Shashua, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research 32, 1238 (2013)

work page 2013
[6]

Silver, J

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, Mastering the game of go without human knowledge, Nature 550, 354 (2017)

work page 2017
[7]

Brown and T

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science 365, 885 (2019)

work page 2019
[8]

Challenges of Real-World Reinforcement Learning

G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, arXiv:1904.12901

work page internal anchor Pith review Pith/arXiv arXiv 1904
[9]

Silver, A

D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershel- vam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch- brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature 529, 484 (2016)

work page 2016
[10]

T. L. Scholten, C. J. Williams, D. Moody, M. Mosca, W. Hur- ley, W. J. Zeng, M. Troyer, and J. M. Gambetta, Assessing the benefits and risks of quantum computers, arXiv:2401.16317

work page arXiv
[11]

Meyer, C

N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, A survey on quantum reinforcement learning, arXiv:2211.03464

work page arXiv
[12]

L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 1996), pp. 212–219

work page 1996
[13]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum reinforce- ment learning, IEEE Transactions on Systems, Man, and Cy- bernetics, Part B (Cybernetics) 38, 1207 (2008)

work page 2008
[14]

Dao-Yi, C

D. Dao-Yi, C. Chun-Lin, C. Zong-Hai, and Z. Chen-Bin, Quan- tum mechanics helps in learning for more intelligent robots, Chinese Physics Letters 23, 1691 (2006)

work page 2006
[15]

Chen and D.-Y

C.-L. Chen and D.-Y . Dong, Superposition-inspired reinforce- ment learning and quantum reinforcement learning, in Rein- forcement Learning, edited by C. Weber, M. Elshaw, and N. M. Mayer (IntechOpen, Rijeka, 2008), Chap. 4

work page 2008
[16]

C. L. CHEN, D. Y . DONG, and Z. H. CHEN, Quantum compu- tation for action selection using reinforcement learning, Inter- national Journal of Quantum Information 04, 1071 (2006)

work page 2006
[17]

D. Dong, C. Chen, J. Chu, and T.-J. Tarn, Robust quantum-inspired reinforcement learning for robot navigation, IEEE/ASME Transactions on Mechatronics 17, 86 (2012)

work page 2012
[18]

Ganger and W

M. Ganger and W. Hu, Quantum multiple q-learning, Interna- tional Journal of Intelligence Science 9, 1 (2019)

work page 2019
[19]

B. Cho, Y . Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversarial environ- ment, IEEE Transactions on Knowledge and Data Engineering 36, 311 (2024)

work page 2024
[20]

Q. Wei, H. Ma, C. Chen, and D. Dong, Deep reinforcement learning with quantum-inspired experience replay, IEEE Trans- actions on Cybernetics 52, 9326 (2022)

work page 2022
[21]

Y . Li, A. H. Aghvami, and D. Dong, Intelligent trajectory plan- ning in UA V-mounted wireless networks: A quantum-inspired reinforcement learning perspective, IEEE Wireless Communi- cations Letters 10, 1994 (2021)

work page 1994
[22]

J.-A. Li, D. Dong, Z. Wei, Y . Liu, Y . Pan, F. Nori, and 18 X. Zhang, Quantum reinforcement learning during human decision-making, Nature Human Behaviour 4, 294 (2020)

work page 2020
[23]

Niraula, J

D. Niraula, J. Jamaluddin, M. M. Matuszak, R. K. T. Haken, and I. E. Naqa, Quantum deep reinforcement learning for clini- cal decision support in oncology: application to adaptive radio- therapy, Scientific reports11, 23545 (2021)

work page 2021
[24]

Sequeira, L

A. Sequeira, L. P. Santos, and L. S. Barbosa, Policy gradients using variational quantum circuits, arXiv:2203.10591

work page arXiv
[25]

S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.- S. Goan, Variational quantum circuits for deep reinforcement learning, IEEE Access 8, 141007 (2020)

work page 2020
[26]

Lockwood and M

O. Lockwood and M. Si, Reinforcement learning with quantum variational circuits, in Proceedings of the Sixteenth AAAI Con- ference on Artificial Intelligence and Interactive Digital Enter- tainment, AIIDE’20 (AAAI Press, USA, 2020), V ol. 16, pp. 245-251

work page 2020
[27]

Lockwood and M

O. Lockwood and M. Si, Playing Atari with hybrid quantum- classical reinforcement learning, in NeurIPS 2020 workshop on pre-registration in machine learning(PMLR, USA, 2021), V ol. 148, pp. 285–301

work page 2020
[28]

S. Wu, S. Jin, D. Wen, D. Han, and X. Wang, Quan- tum reinforcement learning in continuous action space, arXiv:2012.10711

work page arXiv 2012
[29]

Skolik, S

A. Skolik, S. Jerbi, and V . Dunjko, Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning, Quantum 6, 720 (2022)

work page 2022
[30]

Jerbi, C

S. Jerbi, C. Gyurik, S. C. Marshall, H. J. Briegel, and V . Dun- jko, Parametrized quantum policies for reinforcement learning, in Proceedings of the 35th International Conference on Neural Information Processing Systems , NIPS’21 (Curran Associates Inc., USA, 2021), pp. 28362–28375

work page 2021
[31]

Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, In- troduction to quantum reinforcement learning: Theory and pennylane-based implementation, in 2021 International Con- ference on Information and Communication Technology Con- vergence (ICTC) (IEEE, Korea, 2021), pp. 416–420

work page 2021
[32]

Lan, Variational quantum soft actor-critic, arXiv:2112.11921

Q. Lan, Variational quantum soft actor-critic, arXiv:2112.11921

work page arXiv
[33]

D. Wang, A. Sundaram, R. Kothari, A. Kapoor, and M. Roet- teler, Quantum algorithms for reinforcement learning with a generative model, arXiv:2112.08451

work page arXiv
[34]

E. A. Cherrat, I. Kerenidis, and A. Prakash, Quantum reinforce- ment learning via policy iteration, Quantum Machine Intelli- gence 5, 30 (2023)

work page 2023
[35]

Wiedemann, D

S. Wiedemann, D. Hein, S. Udluft, and C. Mendl, Quantum policy iteration via amplitude estimation and grover search – towards quantum advantage for reinforcement learning, arXiv:2206.04741

work page arXiv
[36]

Dunjko, J

V . Dunjko, J. M. Taylor, and H. J. Briegel, Quantum-enhanced machine learning, Phys. Rev. Lett. 117, 130501 (2016)

work page 2016
[37]

Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

A. Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

work page 2022
[38]

Morales, Grokking Deep Reinforcement Learning(Manning Publications, New York, 2020)

M. Morales, Grokking Deep Reinforcement Learning(Manning Publications, New York, 2020)

work page 2020
[39]

Rieffel and W

E. Rieffel and W. Polak, Quantum Computing: A Gentle Intro- duction (The MIT Press, Cambridge, 2011)

work page 2011
[40]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, UK, 2011)

work page 2011
[41]

P. W. Shor, Algorithms for quantum computation: discrete log- arithms and factoring, in Proceedings 35th annual symposium on foundations of computer science (IEEE, USA, 1994), pp. 124–134

work page 1994
[42]

P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997)

work page 1997
[43]

Ekert and R

A. Ekert and R. Jozsa, Quantum computation and Shor’s factor- ing algorithm, Rev. Mod. Phys. 68, 733 (1996)

work page 1996
[44]

Quantum computing with Qiskit

A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lish- man, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Cross, B. R. Johnson, and J. M. Gambetta, Quantum comput- ing with Qiskit, arXiv:2405.08810

work page internal anchor Pith review Pith/arXiv arXiv
[45]

P. Kaye, R. Laflamme, and M. Mosca, An Introduction to Quantum Computing (Oxford University Press Inc., New York, 2007)

work page 2007
[46]

Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

C. Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

work page 2023

[1] [1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An In- troduction (The MIT Press, Cambridge, 2018)

work page 2018

[2] [2]

Graesser and W

L. Graesser and W. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley, USA, 2020)

work page 2020

[3] [3]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

work page 2016

[4] [4]

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

S. Shalev-Shwartz, S. Shammah, and A. Shashua, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research 32, 1238 (2013)

work page 2013

[6] [6]

Silver, J

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, Mastering the game of go without human knowledge, Nature 550, 354 (2017)

work page 2017

[7] [7]

Brown and T

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science 365, 885 (2019)

work page 2019

[8] [8]

Challenges of Real-World Reinforcement Learning

G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, arXiv:1904.12901

work page internal anchor Pith review Pith/arXiv arXiv 1904

[9] [9]

Silver, A

D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershel- vam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch- brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature 529, 484 (2016)

work page 2016

[10] [10]

T. L. Scholten, C. J. Williams, D. Moody, M. Mosca, W. Hur- ley, W. J. Zeng, M. Troyer, and J. M. Gambetta, Assessing the benefits and risks of quantum computers, arXiv:2401.16317

work page arXiv

[11] [11]

Meyer, C

N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, A survey on quantum reinforcement learning, arXiv:2211.03464

work page arXiv

[12] [12]

L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 1996), pp. 212–219

work page 1996

[13] [13]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum reinforce- ment learning, IEEE Transactions on Systems, Man, and Cy- bernetics, Part B (Cybernetics) 38, 1207 (2008)

work page 2008

[14] [14]

Dao-Yi, C

D. Dao-Yi, C. Chun-Lin, C. Zong-Hai, and Z. Chen-Bin, Quan- tum mechanics helps in learning for more intelligent robots, Chinese Physics Letters 23, 1691 (2006)

work page 2006

[15] [15]

Chen and D.-Y

C.-L. Chen and D.-Y . Dong, Superposition-inspired reinforce- ment learning and quantum reinforcement learning, in Rein- forcement Learning, edited by C. Weber, M. Elshaw, and N. M. Mayer (IntechOpen, Rijeka, 2008), Chap. 4

work page 2008

[16] [16]

C. L. CHEN, D. Y . DONG, and Z. H. CHEN, Quantum compu- tation for action selection using reinforcement learning, Inter- national Journal of Quantum Information 04, 1071 (2006)

work page 2006

[17] [17]

D. Dong, C. Chen, J. Chu, and T.-J. Tarn, Robust quantum-inspired reinforcement learning for robot navigation, IEEE/ASME Transactions on Mechatronics 17, 86 (2012)

work page 2012

[18] [18]

Ganger and W

M. Ganger and W. Hu, Quantum multiple q-learning, Interna- tional Journal of Intelligence Science 9, 1 (2019)

work page 2019

[19] [19]

B. Cho, Y . Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversarial environ- ment, IEEE Transactions on Knowledge and Data Engineering 36, 311 (2024)

work page 2024

[20] [20]

Q. Wei, H. Ma, C. Chen, and D. Dong, Deep reinforcement learning with quantum-inspired experience replay, IEEE Trans- actions on Cybernetics 52, 9326 (2022)

work page 2022

[21] [21]

Y . Li, A. H. Aghvami, and D. Dong, Intelligent trajectory plan- ning in UA V-mounted wireless networks: A quantum-inspired reinforcement learning perspective, IEEE Wireless Communi- cations Letters 10, 1994 (2021)

work page 1994

[22] [22]

J.-A. Li, D. Dong, Z. Wei, Y . Liu, Y . Pan, F. Nori, and 18 X. Zhang, Quantum reinforcement learning during human decision-making, Nature Human Behaviour 4, 294 (2020)

work page 2020

[23] [23]

Niraula, J

D. Niraula, J. Jamaluddin, M. M. Matuszak, R. K. T. Haken, and I. E. Naqa, Quantum deep reinforcement learning for clini- cal decision support in oncology: application to adaptive radio- therapy, Scientific reports11, 23545 (2021)

work page 2021

[24] [24]

Sequeira, L

A. Sequeira, L. P. Santos, and L. S. Barbosa, Policy gradients using variational quantum circuits, arXiv:2203.10591

work page arXiv

[25] [25]

S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.- S. Goan, Variational quantum circuits for deep reinforcement learning, IEEE Access 8, 141007 (2020)

work page 2020

[26] [26]

Lockwood and M

O. Lockwood and M. Si, Reinforcement learning with quantum variational circuits, in Proceedings of the Sixteenth AAAI Con- ference on Artificial Intelligence and Interactive Digital Enter- tainment, AIIDE’20 (AAAI Press, USA, 2020), V ol. 16, pp. 245-251

work page 2020

[27] [27]

Lockwood and M

O. Lockwood and M. Si, Playing Atari with hybrid quantum- classical reinforcement learning, in NeurIPS 2020 workshop on pre-registration in machine learning(PMLR, USA, 2021), V ol. 148, pp. 285–301

work page 2020

[28] [28]

S. Wu, S. Jin, D. Wen, D. Han, and X. Wang, Quan- tum reinforcement learning in continuous action space, arXiv:2012.10711

work page arXiv 2012

[29] [29]

Skolik, S

A. Skolik, S. Jerbi, and V . Dunjko, Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning, Quantum 6, 720 (2022)

work page 2022

[30] [30]

Jerbi, C

S. Jerbi, C. Gyurik, S. C. Marshall, H. J. Briegel, and V . Dun- jko, Parametrized quantum policies for reinforcement learning, in Proceedings of the 35th International Conference on Neural Information Processing Systems , NIPS’21 (Curran Associates Inc., USA, 2021), pp. 28362–28375

work page 2021

[31] [31]

Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, In- troduction to quantum reinforcement learning: Theory and pennylane-based implementation, in 2021 International Con- ference on Information and Communication Technology Con- vergence (ICTC) (IEEE, Korea, 2021), pp. 416–420

work page 2021

[32] [32]

Lan, Variational quantum soft actor-critic, arXiv:2112.11921

Q. Lan, Variational quantum soft actor-critic, arXiv:2112.11921

work page arXiv

[33] [33]

D. Wang, A. Sundaram, R. Kothari, A. Kapoor, and M. Roet- teler, Quantum algorithms for reinforcement learning with a generative model, arXiv:2112.08451

work page arXiv

[34] [34]

E. A. Cherrat, I. Kerenidis, and A. Prakash, Quantum reinforce- ment learning via policy iteration, Quantum Machine Intelli- gence 5, 30 (2023)

work page 2023

[35] [35]

Wiedemann, D

S. Wiedemann, D. Hein, S. Udluft, and C. Mendl, Quantum policy iteration via amplitude estimation and grover search – towards quantum advantage for reinforcement learning, arXiv:2206.04741

work page arXiv

[36] [36]

Dunjko, J

V . Dunjko, J. M. Taylor, and H. J. Briegel, Quantum-enhanced machine learning, Phys. Rev. Lett. 117, 130501 (2016)

work page 2016

[37] [37]

Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

A. Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

work page 2022

[38] [38]

Morales, Grokking Deep Reinforcement Learning(Manning Publications, New York, 2020)

M. Morales, Grokking Deep Reinforcement Learning(Manning Publications, New York, 2020)

work page 2020

[39] [39]

Rieffel and W

E. Rieffel and W. Polak, Quantum Computing: A Gentle Intro- duction (The MIT Press, Cambridge, 2011)

work page 2011

[40] [40]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, UK, 2011)

work page 2011

[41] [41]

P. W. Shor, Algorithms for quantum computation: discrete log- arithms and factoring, in Proceedings 35th annual symposium on foundations of computer science (IEEE, USA, 1994), pp. 124–134

work page 1994

[42] [42]

P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997)

work page 1997

[43] [43]

Ekert and R

A. Ekert and R. Jozsa, Quantum computation and Shor’s factor- ing algorithm, Rev. Mod. Phys. 68, 733 (1996)

work page 1996

[44] [44]

Quantum computing with Qiskit

A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lish- man, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Cross, B. R. Johnson, and J. M. Gambetta, Quantum comput- ing with Qiskit, arXiv:2405.08810

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

P. Kaye, R. Laflamme, and M. Mosca, An Introduction to Quantum Computing (Oxford University Press Inc., New York, 2007)

work page 2007

[46] [46]

Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

C. Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

work page 2023