pith. sign in

arxiv: 2509.16002 · v2 · submitted 2025-09-19 · 🪐 quant-ph · cs.LG

Scalable Quantum Reinforcement Learning on NISQ Devices with Dynamic-Circuit Qubit Reuse and Grover Optimization

Pith reviewed 2026-05-18 15:25 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG
keywords quantum reinforcement learningNISQ devicesdynamic circuitsqubit reuseGrover optimizationQMDPmid-circuit measurementtrajectory fidelity
0
0 comments X

The pith

Dynamic circuits with mid-circuit resets reduce qubit needs for multi-step quantum reinforcement learning from O(T) to O(1) while preserving trajectory fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a quantum reinforcement learning framework that encodes environment dynamics entirely in quantum Hilbert space for coherent superposition over state-action sequences. It proposes a dynamic execution model that uses mid-circuit measurement and reset to recycle a fixed set of seven physical qubits across an arbitrary number of interaction steps. This replaces the static unrolled circuit that would otherwise require seven qubits per time step. The method maintains functional equivalence at the level of generated trajectories and applies Grover amplitude amplification to favor high-return sequences. Simulations and hardware runs on an IBM processor confirm the approach works on current noisy devices.

Core claim

The central claim is that the dynamic execution model for multi-step QMDPs employs mid-circuit measurement and reset to recycle a fixed physical quantum register across sequential interactions, generating identical state-action sequences to a static unrolled QMDP while reducing the physical qubit requirement from 7xT to a constant 7 independent of the interaction horizon T, thereby transforming qubit complexity from O(T) to O(1) while maintaining trajectory fidelity.

What carries the argument

The dynamic execution model with mid-circuit measurement and reset that recycles a fixed seven-qubit register across sequential interactions in the QMDP.

Load-bearing premise

Mid-circuit measurements and resets can be performed with low enough error that the generated state-action sequences remain functionally equivalent to a static unrolled circuit without cumulative decoherence altering the sampled trajectories.

What would settle it

Execute both the dynamic and static unrolled circuits for successively larger interaction horizons T on the same NISQ device and check whether the distribution of sampled trajectory returns begins to diverge beyond the level expected from hardware noise alone.

Figures

Figures reproduced from arXiv: 2509.16002 by Masaaki Kondo, Shaswot Shresthamali, Thet Htar Su.

Figure 1
Figure 1. Figure 1: FIG. 1. Graphical representation of a classical MDP with four states [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Quantum circuit of the MDP encoding states, actions, transitions, and rewards into qubits. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. QMDP circuit utilizing dynamic circuit capability for three time steps of agent–environment interaction. Each interaction applies the [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Visualization of state visitation patterns across three time steps for quantum trajectories generated by a dynamic QMDP circuit, [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Quantum trajectory distribution from the qubit-reuse QMDP circuit executed over three time steps on the 133-qubit IBM Heron [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Sampling distribution of quantum trajectories from Grover’s search. The horizontal axis denotes the unique trajectory identifiers, and [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Static QMDP circuit implementation of agent–environment interactions across three time steps (t = 0, 1, 2). Each colored block [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

A scalable and resource-efficient quantum reinforcement learning framework is presented that eliminates the linear qubit-scaling barrier in multi-step quantum Markov decision processes (QMDPs). The proposed framework integrates a QMDP formulation, dynamic-circuit execution, and Grover-based amplitude amplification into a unified quantum-native architecture. Environment dynamics are encoded entirely within quantum Hilbert space, enabling coherent superposition over state-action sequences and a direct quantum agent-environment interface without intermediate quantum-to-classical conversion. The central contribution is a dynamic execution model for multi-step QMDPs that employs mid-circuit measurement and reset to recycle a fixed physical quantum register across sequential interactions. This approach preserves trajectory fidelity relative to a static unrolled QMDP, generating identical state-action sequences while reducing the physical qubit requirement from 7xT to a constant 7, independent of the interaction horizon T. Thus, the qubit complexity of multi-step QMDPs is transformed from O(T) to O(1) while maintaining functional equivalence at the level of trajectory generation. Trajectory returns are evaluated via quantum arithmetic, and high-return trajectories are marked and amplified using amplitude amplification to increase their sampling probability. Simulations confirm preservation of trajectory fidelity with a 66% qubit reduction compared to a static design. Experimental execution on an IBM Heron-class processor demonstrates feasibility on noisy intermediate-scale quantum hardware, establishing a scalable and resource-efficient foundation for large-scale quantum-native reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a quantum reinforcement learning framework for multi-step QMDPs that combines a quantum-native formulation with dynamic-circuit execution using mid-circuit measurement and reset to reuse a fixed 7-qubit register across T steps, reducing qubit count from 7T to 7 (O(T) to O(1)). It incorporates Grover amplitude amplification to boost sampling of high-return trajectories and claims that the dynamic model produces identical state-action sequences and preserves trajectory fidelity relative to a static unrolled circuit, supported by simulations showing 66% qubit reduction and experimental runs on an IBM Heron processor.

Significance. If the fidelity-preservation claim holds under realistic NISQ noise, the work would remove a major resource barrier for scaling quantum RL to longer horizons, enabling practical quantum-native agents on current hardware. The architectural synthesis of dynamic circuits, QMDP encoding, and Grover optimization is a concrete step toward resource-efficient quantum algorithms.

major comments (2)
  1. [Abstract] Abstract: The central claim that the dynamic execution model 'preserves trajectory fidelity relative to a static unrolled QMDP' and generates 'identical state-action sequences' is presented without any quantitative fidelity metrics, error bounds, or direct distributional comparisons (e.g., total variation distance or KL divergence between trajectory returns). This absence prevents verification that mid-circuit measurement/reset errors do not cumulatively alter sampled trajectories differently from the parallel noise channels in the 7T-qubit static circuit.
  2. [Results] Results/Experimental section: The simulation confirmation of fidelity preservation and the IBM Heron execution are described only qualitatively ('demonstrates feasibility'). No numerical values for per-step reset/measurement error rates, accumulated fidelity after T steps, success probabilities, or baseline comparisons against the static unrolled circuit are supplied, leaving the O(1) scaling claim without load-bearing empirical support.
minor comments (1)
  1. [Abstract] Abstract: The reported 66% qubit reduction is specific to a particular T (presumably T=3); stating the exact horizon used in the simulations would clarify how the general O(1) claim maps to the concrete result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight opportunities to strengthen the quantitative presentation of our fidelity-preservation results. We address each point below and will revise the manuscript to incorporate the requested metrics and comparisons.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the dynamic execution model 'preserves trajectory fidelity relative to a static unrolled QMDP' and generates 'identical state-action sequences' is presented without any quantitative fidelity metrics, error bounds, or direct distributional comparisons (e.g., total variation distance or KL divergence between trajectory returns). This absence prevents verification that mid-circuit measurement/reset errors do not cumulatively alter sampled trajectories differently from the parallel noise channels in the 7T-qubit static circuit.

    Authors: We agree that explicit quantitative metrics improve verifiability. By construction, the dynamic-circuit model with mid-circuit measurement and reset replicates the exact unitary evolution and measurement outcomes of the static unrolled circuit on a per-step basis; therefore the generated state-action sequences and trajectory returns are identical in the absence of hardware noise. In the revised manuscript we will add to the abstract and a new results subsection the total variation distance (which equals zero under ideal simulation) together with analytic error bounds on the cumulative deviation arising from realistic per-step measurement/reset error rates. These bounds will be compared directly against the parallel noise channels present in the 7T-qubit static circuit. revision: yes

  2. Referee: [Results] Results/Experimental section: The simulation confirmation of fidelity preservation and the IBM Heron execution are described only qualitatively ('demonstrates feasibility'). No numerical values for per-step reset/measurement error rates, accumulated fidelity after T steps, success probabilities, or baseline comparisons against the static unrolled circuit are supplied, leaving the O(1) scaling claim without load-bearing empirical support.

    Authors: The referee is correct that the current text is primarily qualitative. We will expand the Results section with concrete numerical data: per-step reset and measurement error rates extracted from the IBM Heron calibration data, accumulated fidelity after T = 5 and T = 10 steps for both dynamic and static circuits, success probabilities of the Grover-amplified high-return trajectories, and side-by-side distributional comparisons (including total variation distance and KL divergence) between the two implementations. Additional figures will display fidelity-decay curves and qubit-count scaling, thereby supplying the load-bearing empirical support for the O(1) claim. revision: yes

Circularity Check

0 steps flagged

No circularity: qubit scaling is an architectural construction, not a reduction to fitted inputs or self-citations.

full rationale

The paper introduces a dynamic-circuit execution model that recycles a fixed register via mid-circuit measurement and reset, claiming this transforms qubit complexity from O(T) to O(1) while preserving trajectory fidelity by construction of the circuit design. No equations, fitted parameters, or self-citations are presented that reduce the central claim back to its own inputs; the equivalence to the static unrolled QMDP is asserted as a property of the proposed architecture rather than derived from prior results or data fits. The framework remains self-contained against external benchmarks as a novel hardware-efficient implementation for QMDPs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate explicit free parameters or axioms; the approach implicitly assumes that environment dynamics admit a coherent quantum encoding and that mid-circuit operations preserve sufficient coherence for trajectory equivalence.

pith-pipeline@v0.9.0 · 5789 in / 1245 out tokens · 37552 ms · 2026-05-18T15:25:00.770501+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values

    Quantum Return Calculation To evaluate trajectory performance in the quantum domain, the classical concept of return, defined as the discounted sum of rewards, is encoded into a dedicated quantum register |g⟩. The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values. Initially, th...

  2. [2]

    Optimal Policy Search via Grover’s Algorithm In classical RL, the objective is to learn an optimal policy, a strategy that prescribes the best action for each state to max- imize return. In quantum reinforcement learning (QRL), this objective can be reformulated as a search problem, in which the ensemble of length- T quantum trajectories, generated by the...

  3. [3]

    1 is implemented in the quantum domain by encoding its dynamics into quantum states

    Quantum encoding of the classical MDP The classical MDP described in Fig. 1 is implemented in the quantum domain by encoding its dynamics into quantum states. This quantum realization, shown in Fig. 2, preserves the structure of the classical model while exploiting superpo- sition to evaluate all state–action transitions in parallel. The implementation us...

  4. [4]

    Implementation of multiple interactions on quantum hardware To evaluate the practicality of the proposed dynamic- circuit-based reusable QMDP for multiple interactions, we deployed the full three-timestep circuit on the 133-qubit IBM Heron-class quantum processor (ibm torino). This device rep- resents IBM quantum’s latest generation of superconducting har...

  5. [5]

    These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices

    Trade-offs between static and dynamic implementations To better understand the practical implications of adopting a dynamic-circuit approach, we outline the trade-offs between static and dynamic implementations of QMDP. These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices. Table I s...

  6. [6]

    Each run was configured for 32K shots, and the circuit was executed 30 times to obtain a statistically meaningful dis- tribution of outcomes

    Implementation and validation of optimal trajectory search on quantum hardware To further evaluate the proposed QRL framework, Grover’s search–based optimal trajectory identification was executed entirely on a quantum device (IBM’s ibm torino processor), without reliance on quantum simulators or classical subrou- tines. Each run was configured for 32K sho...

  7. [7]

    Dulac-Arnold, N

    G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Padu- raru, S. Gowal, and T. Hester, Challenges of real-world re- inforcement learning: definitions, benchmarks and analysis, Mach. Learn. 110, 2419–2468 (2021)

  8. [8]

    Jerbi, L

    S. Jerbi, L. M. Trenkwalder, H. Poulsen Nautrup, H. J. Briegel, and V . Dunjko, Quantum enhancements for deep reinforcement learning in large spaces, PRX Quantum 2, 010328 (2021)

  9. [9]

    S. Y .-C. Chen, An introduction to quantum reinforcement learn- ing (QRL), arXiv:2409.05846

  10. [10]

    Meyer, C

    N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, A survey on quantum reinforcement learning, arXiv:2211.03464

  11. [11]

    Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

    J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

  12. [12]

    Wiedemann, D

    S. Wiedemann, D. Hein, S. Udluft, and C. B. Mendl, Quantum policy iteration via amplitude estimation and grover search – towards quantum advantage for reinforcement learning, Trans- actions on Machine Learning Research (2023)

  13. [13]

    T. H. Su, S. Shresthamali, and M. Kondo, Quantum framework for reinforcement learning: Integrating the markov decision process, quantum arithmetic, and trajectory search, Phys. Rev. A 111, 062421 (2025)

  14. [14]

    Pawar, Y

    A. Pawar, Y . Li, Z. Mo, Y . Guo, X. Tang, Y . Zhang, and J. Yang, QRCC: Evaluating large quantum circuits on small quantum computers through integrated qubit reuse and circuit cutting, in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Oper- ating Systems, ASPLOS ‘24 (Association for Computing M...

  15. [15]

    Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

    P. Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

  16. [16]

    J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis, Demonstra- tion of the trapped-ion quantum CCD computer architecture, Nature 592, 209 (2021)

  17. [17]

    Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

    B. Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

  18. [18]

    D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum reinforce- ment learning, IEEE Transactions on Systems, Man, and Cy- bernetics, Part B (Cybernetics) 38, 1207 (2008)

  19. [19]

    Dao-Yi, C

    D. Dao-Yi, C. Chun-Lin, C. Zong-Hai, and Z. Chen-Bin, Quan- tum mechanics helps in learning for more intelligent robots, Chinese Physics Letters 23, 1691 (2006)

  20. [20]

    Chen and D.-Y

    C.-L. Chen and D.-Y . Dong, Superposition-inspired reinforce- ment learning and quantum reinforcement learning, in Rein- forcement Learning, edited by C. Weber, M. Elshaw, and N. M. Mayer (IntechOpen, Rijeka, 2008), Chap. 4

  21. [21]

    C. L. CHEN, D. Y . DONG, and Z. H. CHEN, Quantum compu- tation for action selection using reinforcement learning, Inter- national Journal of Quantum Information 04, 1071 (2006)

  22. [22]

    D. Dong, C. Chen, J. Chu, and T.-J. Tarn, Robust quantum-inspired reinforcement learning for robot navigation, IEEE/ASME Transactions on Mechatronics 17, 86 (2012)

  23. [23]

    Ganger and W

    M. Ganger and W. Hu, Quantum multiple q-learning, Interna- tional Journal of Intelligence Science 9, 1 (2019)

  24. [24]

    B. Cho, Y . Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversarial environ- ment, IEEE Transactions on Knowledge and Data Engineering 36, 311 (2024)

  25. [25]

    Q. Wei, H. Ma, C. Chen, and D. Dong, Deep reinforcement learning with quantum-inspired experience replay, IEEE Trans- actions on Cybernetics 52, 9326 (2022)

  26. [26]

    Y . Li, A. H. Aghvami, and D. Dong, Intelligent trajectory plan- ning in UA V-mounted wireless networks: A quantum-inspired reinforcement learning perspective, IEEE Wireless Communi- cations Letters 10, 1994 (2021)

  27. [27]

    J.-A. Li, D. Dong, Z. Wei, Y . Liu, Y . Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during human decision-making, Nature Human Behaviour 4, 294 (2020)

  28. [28]

    Niraula, J

    D. Niraula, J. Jamaluddin, M. M. Matuszak, R. K. T. Haken, and I. E. Naqa, Quantum deep reinforcement learning for clini- cal decision support in oncology: application to adaptive radio- therapy, Scientific reports 11, 23545 (2021)

  29. [29]

    Sequeira, L

    A. Sequeira, L. P. Santos, and L. S. Barbosa, Policy gradients using variational quantum circuits, Quantum Machine Intelli- gence 5, 18 (2023)

  30. [30]

    S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.- S. Goan, Variational quantum circuits for deep reinforcement learning, IEEE Access 8, 141007 (2020)

  31. [31]

    Lockwood and M

    O. Lockwood and M. Si, Reinforcement learning with quantum variational circuits, in Proceedings of the Sixteenth AAAI Con- ference on Artificial Intelligence and Interactive Digital Enter- tainment, AIIDE’20 (AAAI Press, USA, 2020), V ol. 16, pp. 245-251

  32. [32]

    Lockwood and M

    O. Lockwood and M. Si, Playing Atari with hybrid quantum- classical reinforcement learning, in NeurIPS 2020 Workshop on Pre-registration in Machine Learning (PMLR, USA, 2021), V ol. 148, pp. 285–301

  33. [33]

    S. Wu, S. Jin, D. Wen, D. Han, and X. Wang, Quantum re- inforcement learning in continuous action space, Quantum 9, 1660 (2025)

  34. [34]

    Skolik, S

    A. Skolik, S. Jerbi, and V . Dunjko, Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning, Quantum 6, 720 (2022)

  35. [35]

    Jerbi, C

    S. Jerbi, C. Gyurik, S. C. Marshall, H. J. Briegel, and V . Dun- jko, Parametrized quantum policies for reinforcement learning, in Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ‘21 (Curran Associates Inc., Red Hook, NY , 2021), pp. 28362–28375

  36. [36]

    Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, In- troduction to quantum reinforcement learning: Theory and pennylane-based implementation, in 2021 International Con- ference on Information and Communication Technology Con- vergence (ICTC) (IEEE, Korea, 2021), pp. 416–420

  37. [37]

    Lan, Variational quantum soft actor-critic, arXiv:2112.11921

    Q. Lan, Variational quantum soft actor-critic, arXiv:2112.11921

  38. [38]

    D. Wang, A. Sundaram, R. Kothari, A. Kapoor, and M. Roet- teler, Quantum algorithms for reinforcement learning with a generative model, inProceedings of the 38th International Con- ference on Machine Learning, Proceedings of Machine Learn- ing Research, V ol. 139, edited by M. Meila and T. Zhang (PMLR, 2021), pp. 10916–10926

  39. [39]

    E. A. Cherrat, I. Kerenidis, and A. Prakash, Quantum reinforce- ment learning via policy iteration, Quantum Machine Intelli- gence 5, 30 (2023)

  40. [40]

    F. Hua, Y . Jin, Y . Chen, S. Vittal, K. Krsulich, L. S. Bishop, J. Lapeyre, A. Javadi-Abhari, and E. Z. Zhang, CaQR: A compiler-assisted approach for qubit reuse through dynamic circuit, in Proceedings of the 28th ACM International Confer- 17 ence on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023 (Association for Compu...

  41. [41]

    DeCross, E

    M. DeCross, E. Chertkov, M. Kohagen, and M. Foss-Feig, Qubit-reuse compilation with mid-circuit measurement and re- set, Phys. Rev. X 13, 041057 (2023)

  42. [42]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An In- troduction (The MIT Press, Cambridge, 2018)

  43. [43]

    Graesser and W

    L. Graesser and W. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley, USA, 2020)

  44. [44]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

  45. [45]

    Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

    S. Shalev-Shwartz, S. Shammah, and A. Shashua, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

  46. [46]

    Kober, J

    J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research 32, 1238 (2013)

  47. [47]

    Silver, J

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, Mastering the game of go without human knowledge, Nature (London) 550, 354 (2017)

  48. [48]

    Brown and T

    N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science 365, 885 (2019)

  49. [49]

    Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

    A. Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

  50. [50]

    Rieffel and W

    E. Rieffel and W. Polak, Quantum Computing: A Gentle Intro- duction (The MIT Press, Cambridge, 2011)

  51. [51]

    M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, UK, 2011)

  52. [52]

    P. W. Shor, Algorithms for quantum computation: discrete log- arithms and factoring, in Proceedings 35th Annual Symposium on Foundations of Computer Science (IEEE, Piscataway, NJ,

  53. [53]

    P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997)

  54. [54]

    Ekert and R

    A. Ekert and R. Jozsa, Quantum computation and Shor’s factor- ing algorithm, Rev. Mod. Phys. 68, 733 (1996)

  55. [55]

    L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 1996), pp. 212–219

  56. [56]

    P. Das, A. Locharla, and C. Jones, LILLIPUT: a lightweight low-latency lookup-table decoder for near-term quantum error correction, in Proceedings of the 27th ACM International Con- ference on Architectural Support for Programming Languages and Operating Systems , ASPLOS ‘22 (Association for Com- puting Machinery, USA, 2022), pp. 541–553

  57. [57]

    M. R. Jokar, R. Rines, G. Pasandi, H. Cong, A. Holmes, Y . Shi, M. Pedram, and F. T. Chong, DigiQ: A scalable digital con- troller for quantum computers using SFQ logic, in 2022 IEEE International Symposium on High-Performance Computer Ar- chitecture (HPCA) (IEEE Computer Society, USA, 2022), pp. 400-414

  58. [58]

    A. Wu, G. Li, H. Zhang, G. G. Guerreschi, Y . Ding, and Y . Xie, A synthesis framework for stitching surface code with super- conducting quantum devices, in Proceedings of the 49th An- nual International Symposium on Computer Architecture, ISCA ‘22 (Association for Computing Machinery, USA, 2022), pp. 337–350

  59. [59]

    Huang and M

    Y . Huang and M. Martonosi, QDB: From Quantum Algorithms Towards Correct Quantum Programs, in9th Workshop on Eval- uation and Usability of Programming Languages and Tools (PLATEAU 2018), Open Access Series in Informatics (OA- SIcs), V ol. 67, edited by T. Barik, J. Sunshine, and S. Chasins (Schloss Dagstuhl – Leibniz-Zentrum f¨ur Informatik, Dagstuhl, Ger...

  60. [60]

    J. Liu, G. T. Byrd, and H. Zhou, Quantum circuits for dy- namic runtime assertions in quantum computation, in Proceed- ings of the Twenty-Fifth International Conference on Archi- tectural Support for Programming Languages and Operating Systems, ASPLOS ‘20 (Association for Computing Machinery, USA, 2020), pp. 1017–1030

  61. [61]

    P. Kaye, R. Laflamme, and M. Mosca,An Introduction to Quan- tum Computing (Oxford University Press, New York, 2007)

  62. [62]

    Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

    C. Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

  63. [63]

    AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

    M. AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

  64. [64]

    Rudinger, G

    K. Rudinger, G. J. Ribeill, L. C. Govia, M. Ware, E. Nielsen, K. Young, T. A. Ohki, R. Blume-Kohout, and T. Proctor, Char- acterizing midcircuit measurements on a superconducting qubit using gate set tomography, Phys. Rev. Appl.17, 014014 (2022)

  65. [65]

    L. C. G. Govia, P. Jurcevic, C. J. Wood, N. Kanazawa, S. T. Merkel, and D. C. McKay, A randomized benchmarking suite for mid-circuit measurements, New Journal of Physics 25, 123016 (2023)

  66. [66]

    Hashim, A

    A. Hashim, A. Carignan-Dugas, L. Chen, C. J ¨unger, N. Fruit- wala, Y . Xu, G. Huang, J. J. Wallman, and I. Siddiqi, Quasiprobabilistic readout correction of midcircuit measure- ments for adaptive feedback via measurement randomized com- piling, PRX Quantum 6, 010307 (2025)