Scalable Quantum Reinforcement Learning on NISQ Devices with Dynamic-Circuit Qubit Reuse and Grover Optimization

Masaaki Kondo; Shaswot Shresthamali; Thet Htar Su

arxiv: 2509.16002 · v2 · submitted 2025-09-19 · 🪐 quant-ph · cs.LG

Scalable Quantum Reinforcement Learning on NISQ Devices with Dynamic-Circuit Qubit Reuse and Grover Optimization

Thet Htar Su , Shaswot Shresthamali , Masaaki Kondo This is my paper

Pith reviewed 2026-05-18 15:25 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords quantum reinforcement learningNISQ devicesdynamic circuitsqubit reuseGrover optimizationQMDPmid-circuit measurementtrajectory fidelity

0 comments

The pith

Dynamic circuits with mid-circuit resets reduce qubit needs for multi-step quantum reinforcement learning from O(T) to O(1) while preserving trajectory fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a quantum reinforcement learning framework that encodes environment dynamics entirely in quantum Hilbert space for coherent superposition over state-action sequences. It proposes a dynamic execution model that uses mid-circuit measurement and reset to recycle a fixed set of seven physical qubits across an arbitrary number of interaction steps. This replaces the static unrolled circuit that would otherwise require seven qubits per time step. The method maintains functional equivalence at the level of generated trajectories and applies Grover amplitude amplification to favor high-return sequences. Simulations and hardware runs on an IBM processor confirm the approach works on current noisy devices.

Core claim

The central claim is that the dynamic execution model for multi-step QMDPs employs mid-circuit measurement and reset to recycle a fixed physical quantum register across sequential interactions, generating identical state-action sequences to a static unrolled QMDP while reducing the physical qubit requirement from 7xT to a constant 7 independent of the interaction horizon T, thereby transforming qubit complexity from O(T) to O(1) while maintaining trajectory fidelity.

What carries the argument

The dynamic execution model with mid-circuit measurement and reset that recycles a fixed seven-qubit register across sequential interactions in the QMDP.

Load-bearing premise

Mid-circuit measurements and resets can be performed with low enough error that the generated state-action sequences remain functionally equivalent to a static unrolled circuit without cumulative decoherence altering the sampled trajectories.

What would settle it

Execute both the dynamic and static unrolled circuits for successively larger interaction horizons T on the same NISQ device and check whether the distribution of sampled trajectory returns begins to diverge beyond the level expected from hardware noise alone.

Figures

Figures reproduced from arXiv: 2509.16002 by Masaaki Kondo, Shaswot Shresthamali, Thet Htar Su.

**Figure 2.** Figure 2: FIG. 2. Quantum circuit of the MDP encoding states, actions, transitions, and rewards into qubits. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. QMDP circuit utilizing dynamic circuit capability for three time steps of agent–environment interaction. Each interaction applies the [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Visualization of state visitation patterns across three time steps for quantum trajectories generated by a dynamic QMDP circuit, [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Quantum trajectory distribution from the qubit-reuse QMDP circuit executed over three time steps on the 133-qubit IBM Heron [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Sampling distribution of quantum trajectories from Grover’s search. The horizontal axis denotes the unique trajectory identifiers, and [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Static QMDP circuit implementation of agent–environment interactions across three time steps (t = 0, 1, 2). Each colored block [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

A scalable and resource-efficient quantum reinforcement learning framework is presented that eliminates the linear qubit-scaling barrier in multi-step quantum Markov decision processes (QMDPs). The proposed framework integrates a QMDP formulation, dynamic-circuit execution, and Grover-based amplitude amplification into a unified quantum-native architecture. Environment dynamics are encoded entirely within quantum Hilbert space, enabling coherent superposition over state-action sequences and a direct quantum agent-environment interface without intermediate quantum-to-classical conversion. The central contribution is a dynamic execution model for multi-step QMDPs that employs mid-circuit measurement and reset to recycle a fixed physical quantum register across sequential interactions. This approach preserves trajectory fidelity relative to a static unrolled QMDP, generating identical state-action sequences while reducing the physical qubit requirement from 7xT to a constant 7, independent of the interaction horizon T. Thus, the qubit complexity of multi-step QMDPs is transformed from O(T) to O(1) while maintaining functional equivalence at the level of trajectory generation. Trajectory returns are evaluated via quantum arithmetic, and high-return trajectories are marked and amplified using amplitude amplification to increase their sampling probability. Simulations confirm preservation of trajectory fidelity with a 66% qubit reduction compared to a static design. Experimental execution on an IBM Heron-class processor demonstrates feasibility on noisy intermediate-scale quantum hardware, establishing a scalable and resource-efficient foundation for large-scale quantum-native reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a practical qubit-reuse trick for longer-horizon quantum RL on NISQ hardware, but the fidelity-equivalence claim needs tighter numbers than the abstract supplies.

read the letter

This paper gets multi-step quantum RL working with a fixed handful of qubits instead of scaling linearly with the horizon. They encode the environment in a QMDP, run it as a dynamic circuit, and recycle the same 7-qubit register across steps via mid-circuit measurement and reset. The claim is that the sampled trajectories stay functionally the same as a big static unrolled circuit while cutting the qubit count from 7T down to 7. They add Grover amplification to boost the chance of drawing high-return paths. Simulations report a 66% qubit saving with preserved fidelity, and they actually ran something on an IBM Heron processor. That hardware demo is the most concrete part of the work so far. The combination of dynamic-circuit recycling with Grover selection for trajectory selection is the clearest new element. It turns a resource bottleneck into something that could fit on existing machines for longer sequences. The soft spot is exactly the one the stress-test flagged. Repeated mid-circuit resets on NISQ hardware add noise at each step, and the error channels are sequential rather than parallel. The abstract asserts functional equivalence and fidelity preservation but gives no quantitative distance between the dynamic and static return distributions, no error bars on sampled trajectories, and no breakdown of how much the reset errors actually shift the amplified probabilities. The Heron run is described as demonstrating feasibility, not as a controlled comparison. Without those numbers the central scaling claim stays provisional. This is for people already working on quantum RL or hybrid control who need to stretch limited hardware further. A reader looking for concrete resource-reduction techniques and early hardware results will find something usable here. It deserves a serious referee because the architecture is concrete and the hardware angle is worth checking, even if the fidelity data need strengthening. Send it to review and ask for direct trajectory-distribution comparisons plus reset-error budgets.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a quantum reinforcement learning framework for multi-step QMDPs that combines a quantum-native formulation with dynamic-circuit execution using mid-circuit measurement and reset to reuse a fixed 7-qubit register across T steps, reducing qubit count from 7T to 7 (O(T) to O(1)). It incorporates Grover amplitude amplification to boost sampling of high-return trajectories and claims that the dynamic model produces identical state-action sequences and preserves trajectory fidelity relative to a static unrolled circuit, supported by simulations showing 66% qubit reduction and experimental runs on an IBM Heron processor.

Significance. If the fidelity-preservation claim holds under realistic NISQ noise, the work would remove a major resource barrier for scaling quantum RL to longer horizons, enabling practical quantum-native agents on current hardware. The architectural synthesis of dynamic circuits, QMDP encoding, and Grover optimization is a concrete step toward resource-efficient quantum algorithms.

major comments (2)

[Abstract] Abstract: The central claim that the dynamic execution model 'preserves trajectory fidelity relative to a static unrolled QMDP' and generates 'identical state-action sequences' is presented without any quantitative fidelity metrics, error bounds, or direct distributional comparisons (e.g., total variation distance or KL divergence between trajectory returns). This absence prevents verification that mid-circuit measurement/reset errors do not cumulatively alter sampled trajectories differently from the parallel noise channels in the 7T-qubit static circuit.
[Results] Results/Experimental section: The simulation confirmation of fidelity preservation and the IBM Heron execution are described only qualitatively ('demonstrates feasibility'). No numerical values for per-step reset/measurement error rates, accumulated fidelity after T steps, success probabilities, or baseline comparisons against the static unrolled circuit are supplied, leaving the O(1) scaling claim without load-bearing empirical support.

minor comments (1)

[Abstract] Abstract: The reported 66% qubit reduction is specific to a particular T (presumably T=3); stating the exact horizon used in the simulations would clarify how the general O(1) claim maps to the concrete result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight opportunities to strengthen the quantitative presentation of our fidelity-preservation results. We address each point below and will revise the manuscript to incorporate the requested metrics and comparisons.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the dynamic execution model 'preserves trajectory fidelity relative to a static unrolled QMDP' and generates 'identical state-action sequences' is presented without any quantitative fidelity metrics, error bounds, or direct distributional comparisons (e.g., total variation distance or KL divergence between trajectory returns). This absence prevents verification that mid-circuit measurement/reset errors do not cumulatively alter sampled trajectories differently from the parallel noise channels in the 7T-qubit static circuit.

Authors: We agree that explicit quantitative metrics improve verifiability. By construction, the dynamic-circuit model with mid-circuit measurement and reset replicates the exact unitary evolution and measurement outcomes of the static unrolled circuit on a per-step basis; therefore the generated state-action sequences and trajectory returns are identical in the absence of hardware noise. In the revised manuscript we will add to the abstract and a new results subsection the total variation distance (which equals zero under ideal simulation) together with analytic error bounds on the cumulative deviation arising from realistic per-step measurement/reset error rates. These bounds will be compared directly against the parallel noise channels present in the 7T-qubit static circuit. revision: yes
Referee: [Results] Results/Experimental section: The simulation confirmation of fidelity preservation and the IBM Heron execution are described only qualitatively ('demonstrates feasibility'). No numerical values for per-step reset/measurement error rates, accumulated fidelity after T steps, success probabilities, or baseline comparisons against the static unrolled circuit are supplied, leaving the O(1) scaling claim without load-bearing empirical support.

Authors: The referee is correct that the current text is primarily qualitative. We will expand the Results section with concrete numerical data: per-step reset and measurement error rates extracted from the IBM Heron calibration data, accumulated fidelity after T = 5 and T = 10 steps for both dynamic and static circuits, success probabilities of the Grover-amplified high-return trajectories, and side-by-side distributional comparisons (including total variation distance and KL divergence) between the two implementations. Additional figures will display fidelity-decay curves and qubit-count scaling, thereby supplying the load-bearing empirical support for the O(1) claim. revision: yes

Circularity Check

0 steps flagged

No circularity: qubit scaling is an architectural construction, not a reduction to fitted inputs or self-citations.

full rationale

The paper introduces a dynamic-circuit execution model that recycles a fixed register via mid-circuit measurement and reset, claiming this transforms qubit complexity from O(T) to O(1) while preserving trajectory fidelity by construction of the circuit design. No equations, fitted parameters, or self-citations are presented that reduce the central claim back to its own inputs; the equivalence to the static unrolled QMDP is asserted as a property of the proposed architecture rather than derived from prior results or data fits. The framework remains self-contained against external benchmarks as a novel hardware-efficient implementation for QMDPs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate explicit free parameters or axioms; the approach implicitly assumes that environment dynamics admit a coherent quantum encoding and that mid-circuit operations preserve sufficient coherence for trajectory equivalence.

pith-pipeline@v0.9.0 · 5789 in / 1245 out tokens · 37552 ms · 2026-05-18T15:25:00.770501+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery; no qubit-reuse or dynamic-circuit theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reducing the physical qubit requirement from 7xT to a constant 7, independent of the interaction horizon T, transforming qubit complexity from O(T) to O(1)
IndisputableMonolith/Foundation/AlphaDerivationExplicit.lean phi-fixed-point and 44-slot structure unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Grover’s search is applied to the superposition of these evaluated trajectories to amplify the probability of measuring those with the highest return

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

[1]

The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values

Quantum Return Calculation To evaluate trajectory performance in the quantum domain, the classical concept of return, defined as the discounted sum of rewards, is encoded into a dedicated quantum register |g⟩. The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values. Initially, th...

work page
[2]

Optimal Policy Search via Grover’s Algorithm In classical RL, the objective is to learn an optimal policy, a strategy that prescribes the best action for each state to max- imize return. In quantum reinforcement learning (QRL), this objective can be reformulated as a search problem, in which the ensemble of length- T quantum trajectories, generated by the...

work page
[3]

1 is implemented in the quantum domain by encoding its dynamics into quantum states

Quantum encoding of the classical MDP The classical MDP described in Fig. 1 is implemented in the quantum domain by encoding its dynamics into quantum states. This quantum realization, shown in Fig. 2, preserves the structure of the classical model while exploiting superpo- sition to evaluate all state–action transitions in parallel. The implementation us...

work page
[4]

Implementation of multiple interactions on quantum hardware To evaluate the practicality of the proposed dynamic- circuit-based reusable QMDP for multiple interactions, we deployed the full three-timestep circuit on the 133-qubit IBM Heron-class quantum processor (ibm torino). This device rep- resents IBM quantum’s latest generation of superconducting har...

work page 2000
[5]

These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices

Trade-offs between static and dynamic implementations To better understand the practical implications of adopting a dynamic-circuit approach, we outline the trade-offs between static and dynamic implementations of QMDP. These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices. Table I s...

work page
[6]

Each run was configured for 32K shots, and the circuit was executed 30 times to obtain a statistically meaningful dis- tribution of outcomes

Implementation and validation of optimal trajectory search on quantum hardware To further evaluate the proposed QRL framework, Grover’s search–based optimal trajectory identification was executed entirely on a quantum device (IBM’s ibm torino processor), without reliance on quantum simulators or classical subrou- tines. Each run was configured for 32K sho...

work page
[7]

Dulac-Arnold, N

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Padu- raru, S. Gowal, and T. Hester, Challenges of real-world re- inforcement learning: definitions, benchmarks and analysis, Mach. Learn. 110, 2419–2468 (2021)

work page 2021
[8]

Jerbi, L

S. Jerbi, L. M. Trenkwalder, H. Poulsen Nautrup, H. J. Briegel, and V . Dunjko, Quantum enhancements for deep reinforcement learning in large spaces, PRX Quantum 2, 010328 (2021)

work page 2021
[9]

S. Y .-C. Chen, An introduction to quantum reinforcement learn- ing (QRL), arXiv:2409.05846

work page arXiv
[10]

Meyer, C

N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, A survey on quantum reinforcement learning, arXiv:2211.03464

work page arXiv
[11]

Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

work page 2018
[12]

Wiedemann, D

S. Wiedemann, D. Hein, S. Udluft, and C. B. Mendl, Quantum policy iteration via amplitude estimation and grover search – towards quantum advantage for reinforcement learning, Trans- actions on Machine Learning Research (2023)

work page 2023
[13]

T. H. Su, S. Shresthamali, and M. Kondo, Quantum framework for reinforcement learning: Integrating the markov decision process, quantum arithmetic, and trajectory search, Phys. Rev. A 111, 062421 (2025)

work page 2025
[14]

Pawar, Y

A. Pawar, Y . Li, Z. Mo, Y . Guo, X. Tang, Y . Zhang, and J. Yang, QRCC: Evaluating large quantum circuits on small quantum computers through integrated qubit reuse and circuit cutting, in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Oper- ating Systems, ASPLOS ‘24 (Association for Computing M...

work page 2025
[15]

Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

P. Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

work page 2021
[16]

J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis, Demonstra- tion of the trapped-ion quantum CCD computer architecture, Nature 592, 209 (2021)

work page 2021
[17]

Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

B. Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

work page
[18]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum reinforce- ment learning, IEEE Transactions on Systems, Man, and Cy- bernetics, Part B (Cybernetics) 38, 1207 (2008)

work page 2008
[19]

Dao-Yi, C

D. Dao-Yi, C. Chun-Lin, C. Zong-Hai, and Z. Chen-Bin, Quan- tum mechanics helps in learning for more intelligent robots, Chinese Physics Letters 23, 1691 (2006)

work page 2006
[20]

Chen and D.-Y

C.-L. Chen and D.-Y . Dong, Superposition-inspired reinforce- ment learning and quantum reinforcement learning, in Rein- forcement Learning, edited by C. Weber, M. Elshaw, and N. M. Mayer (IntechOpen, Rijeka, 2008), Chap. 4

work page 2008
[21]

C. L. CHEN, D. Y . DONG, and Z. H. CHEN, Quantum compu- tation for action selection using reinforcement learning, Inter- national Journal of Quantum Information 04, 1071 (2006)

work page 2006
[22]

D. Dong, C. Chen, J. Chu, and T.-J. Tarn, Robust quantum-inspired reinforcement learning for robot navigation, IEEE/ASME Transactions on Mechatronics 17, 86 (2012)

work page 2012
[23]

Ganger and W

M. Ganger and W. Hu, Quantum multiple q-learning, Interna- tional Journal of Intelligence Science 9, 1 (2019)

work page 2019
[24]

B. Cho, Y . Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversarial environ- ment, IEEE Transactions on Knowledge and Data Engineering 36, 311 (2024)

work page 2024
[25]

Q. Wei, H. Ma, C. Chen, and D. Dong, Deep reinforcement learning with quantum-inspired experience replay, IEEE Trans- actions on Cybernetics 52, 9326 (2022)

work page 2022
[26]

Y . Li, A. H. Aghvami, and D. Dong, Intelligent trajectory plan- ning in UA V-mounted wireless networks: A quantum-inspired reinforcement learning perspective, IEEE Wireless Communi- cations Letters 10, 1994 (2021)

work page 1994
[27]

J.-A. Li, D. Dong, Z. Wei, Y . Liu, Y . Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during human decision-making, Nature Human Behaviour 4, 294 (2020)

work page 2020
[28]

Niraula, J

D. Niraula, J. Jamaluddin, M. M. Matuszak, R. K. T. Haken, and I. E. Naqa, Quantum deep reinforcement learning for clini- cal decision support in oncology: application to adaptive radio- therapy, Scientific reports 11, 23545 (2021)

work page 2021
[29]

Sequeira, L

A. Sequeira, L. P. Santos, and L. S. Barbosa, Policy gradients using variational quantum circuits, Quantum Machine Intelli- gence 5, 18 (2023)

work page 2023
[30]

S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.- S. Goan, Variational quantum circuits for deep reinforcement learning, IEEE Access 8, 141007 (2020)

work page 2020
[31]

Lockwood and M

O. Lockwood and M. Si, Reinforcement learning with quantum variational circuits, in Proceedings of the Sixteenth AAAI Con- ference on Artificial Intelligence and Interactive Digital Enter- tainment, AIIDE’20 (AAAI Press, USA, 2020), V ol. 16, pp. 245-251

work page 2020
[32]

Lockwood and M

O. Lockwood and M. Si, Playing Atari with hybrid quantum- classical reinforcement learning, in NeurIPS 2020 Workshop on Pre-registration in Machine Learning (PMLR, USA, 2021), V ol. 148, pp. 285–301

work page 2020
[33]

S. Wu, S. Jin, D. Wen, D. Han, and X. Wang, Quantum re- inforcement learning in continuous action space, Quantum 9, 1660 (2025)

work page 2025
[34]

Skolik, S

A. Skolik, S. Jerbi, and V . Dunjko, Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning, Quantum 6, 720 (2022)

work page 2022
[35]

Jerbi, C

S. Jerbi, C. Gyurik, S. C. Marshall, H. J. Briegel, and V . Dun- jko, Parametrized quantum policies for reinforcement learning, in Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ‘21 (Curran Associates Inc., Red Hook, NY , 2021), pp. 28362–28375

work page 2021
[36]

Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, In- troduction to quantum reinforcement learning: Theory and pennylane-based implementation, in 2021 International Con- ference on Information and Communication Technology Con- vergence (ICTC) (IEEE, Korea, 2021), pp. 416–420

work page 2021
[37]

Lan, Variational quantum soft actor-critic, arXiv:2112.11921

Q. Lan, Variational quantum soft actor-critic, arXiv:2112.11921

work page arXiv
[38]

D. Wang, A. Sundaram, R. Kothari, A. Kapoor, and M. Roet- teler, Quantum algorithms for reinforcement learning with a generative model, inProceedings of the 38th International Con- ference on Machine Learning, Proceedings of Machine Learn- ing Research, V ol. 139, edited by M. Meila and T. Zhang (PMLR, 2021), pp. 10916–10926

work page 2021
[39]

E. A. Cherrat, I. Kerenidis, and A. Prakash, Quantum reinforce- ment learning via policy iteration, Quantum Machine Intelli- gence 5, 30 (2023)

work page 2023
[40]

F. Hua, Y . Jin, Y . Chen, S. Vittal, K. Krsulich, L. S. Bishop, J. Lapeyre, A. Javadi-Abhari, and E. Z. Zhang, CaQR: A compiler-assisted approach for qubit reuse through dynamic circuit, in Proceedings of the 28th ACM International Confer- 17 ence on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023 (Association for Compu...

work page 2023
[41]

DeCross, E

M. DeCross, E. Chertkov, M. Kohagen, and M. Foss-Feig, Qubit-reuse compilation with mid-circuit measurement and re- set, Phys. Rev. X 13, 041057 (2023)

work page 2023
[42]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An In- troduction (The MIT Press, Cambridge, 2018)

work page 2018
[43]

Graesser and W

L. Graesser and W. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley, USA, 2020)

work page 2020
[44]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

work page 2016
[45]

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

S. Shalev-Shwartz, S. Shammah, and A. Shashua, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

work page internal anchor Pith review Pith/arXiv arXiv
[46]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research 32, 1238 (2013)

work page 2013
[47]

Silver, J

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, Mastering the game of go without human knowledge, Nature (London) 550, 354 (2017)

work page 2017
[48]

Brown and T

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science 365, 885 (2019)

work page 2019
[49]

Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

A. Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

work page 2022
[50]

Rieffel and W

E. Rieffel and W. Polak, Quantum Computing: A Gentle Intro- duction (The MIT Press, Cambridge, 2011)

work page 2011
[51]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, UK, 2011)

work page 2011
[52]

P. W. Shor, Algorithms for quantum computation: discrete log- arithms and factoring, in Proceedings 35th Annual Symposium on Foundations of Computer Science (IEEE, Piscataway, NJ,

work page
[53]

P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997)

work page 1997
[54]

Ekert and R

A. Ekert and R. Jozsa, Quantum computation and Shor’s factor- ing algorithm, Rev. Mod. Phys. 68, 733 (1996)

work page 1996
[55]

L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 1996), pp. 212–219

work page 1996
[56]

P. Das, A. Locharla, and C. Jones, LILLIPUT: a lightweight low-latency lookup-table decoder for near-term quantum error correction, in Proceedings of the 27th ACM International Con- ference on Architectural Support for Programming Languages and Operating Systems , ASPLOS ‘22 (Association for Com- puting Machinery, USA, 2022), pp. 541–553

work page 2022
[57]

M. R. Jokar, R. Rines, G. Pasandi, H. Cong, A. Holmes, Y . Shi, M. Pedram, and F. T. Chong, DigiQ: A scalable digital con- troller for quantum computers using SFQ logic, in 2022 IEEE International Symposium on High-Performance Computer Ar- chitecture (HPCA) (IEEE Computer Society, USA, 2022), pp. 400-414

work page 2022
[58]

A. Wu, G. Li, H. Zhang, G. G. Guerreschi, Y . Ding, and Y . Xie, A synthesis framework for stitching surface code with super- conducting quantum devices, in Proceedings of the 49th An- nual International Symposium on Computer Architecture, ISCA ‘22 (Association for Computing Machinery, USA, 2022), pp. 337–350

work page 2022
[59]

Huang and M

Y . Huang and M. Martonosi, QDB: From Quantum Algorithms Towards Correct Quantum Programs, in9th Workshop on Eval- uation and Usability of Programming Languages and Tools (PLATEAU 2018), Open Access Series in Informatics (OA- SIcs), V ol. 67, edited by T. Barik, J. Sunshine, and S. Chasins (Schloss Dagstuhl – Leibniz-Zentrum f¨ur Informatik, Dagstuhl, Ger...

work page 2018
[60]

J. Liu, G. T. Byrd, and H. Zhou, Quantum circuits for dy- namic runtime assertions in quantum computation, in Proceed- ings of the Twenty-Fifth International Conference on Archi- tectural Support for Programming Languages and Operating Systems, ASPLOS ‘20 (Association for Computing Machinery, USA, 2020), pp. 1017–1030

work page 2020
[61]

P. Kaye, R. Laflamme, and M. Mosca,An Introduction to Quan- tum Computing (Oxford University Press, New York, 2007)

work page 2007
[62]

Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

C. Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

work page 2023
[63]

AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

M. AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

work page 2025
[64]

Rudinger, G

K. Rudinger, G. J. Ribeill, L. C. Govia, M. Ware, E. Nielsen, K. Young, T. A. Ohki, R. Blume-Kohout, and T. Proctor, Char- acterizing midcircuit measurements on a superconducting qubit using gate set tomography, Phys. Rev. Appl.17, 014014 (2022)

work page 2022
[65]

L. C. G. Govia, P. Jurcevic, C. J. Wood, N. Kanazawa, S. T. Merkel, and D. C. McKay, A randomized benchmarking suite for mid-circuit measurements, New Journal of Physics 25, 123016 (2023)

work page 2023
[66]

Hashim, A

A. Hashim, A. Carignan-Dugas, L. Chen, C. J ¨unger, N. Fruit- wala, Y . Xu, G. Huang, J. J. Wallman, and I. Siddiqi, Quasiprobabilistic readout correction of midcircuit measure- ments for adaptive feedback via measurement randomized com- piling, PRX Quantum 6, 010307 (2025)

work page 2025

[1] [1]

The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values

Quantum Return Calculation To evaluate trajectory performance in the quantum domain, the classical concept of return, defined as the discounted sum of rewards, is encoded into a dedicated quantum register |g⟩. The return register|g⟩ occupies a Hilbert spaceG that contains a sufficient number of qubits to represent all possible return values. Initially, th...

work page

[2] [2]

Optimal Policy Search via Grover’s Algorithm In classical RL, the objective is to learn an optimal policy, a strategy that prescribes the best action for each state to max- imize return. In quantum reinforcement learning (QRL), this objective can be reformulated as a search problem, in which the ensemble of length- T quantum trajectories, generated by the...

work page

[3] [3]

1 is implemented in the quantum domain by encoding its dynamics into quantum states

Quantum encoding of the classical MDP The classical MDP described in Fig. 1 is implemented in the quantum domain by encoding its dynamics into quantum states. This quantum realization, shown in Fig. 2, preserves the structure of the classical model while exploiting superpo- sition to evaluate all state–action transitions in parallel. The implementation us...

work page

[4] [4]

Implementation of multiple interactions on quantum hardware To evaluate the practicality of the proposed dynamic- circuit-based reusable QMDP for multiple interactions, we deployed the full three-timestep circuit on the 133-qubit IBM Heron-class quantum processor (ibm torino). This device rep- resents IBM quantum’s latest generation of superconducting har...

work page 2000

[5] [5]

These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices

Trade-offs between static and dynamic implementations To better understand the practical implications of adopting a dynamic-circuit approach, we outline the trade-offs between static and dynamic implementations of QMDP. These differ- ences clarify the respective strengths and limitations of each method when deployed on near-term quantum devices. Table I s...

work page

[6] [6]

Each run was configured for 32K shots, and the circuit was executed 30 times to obtain a statistically meaningful dis- tribution of outcomes

Implementation and validation of optimal trajectory search on quantum hardware To further evaluate the proposed QRL framework, Grover’s search–based optimal trajectory identification was executed entirely on a quantum device (IBM’s ibm torino processor), without reliance on quantum simulators or classical subrou- tines. Each run was configured for 32K sho...

work page

[7] [7]

Dulac-Arnold, N

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Padu- raru, S. Gowal, and T. Hester, Challenges of real-world re- inforcement learning: definitions, benchmarks and analysis, Mach. Learn. 110, 2419–2468 (2021)

work page 2021

[8] [8]

Jerbi, L

S. Jerbi, L. M. Trenkwalder, H. Poulsen Nautrup, H. J. Briegel, and V . Dunjko, Quantum enhancements for deep reinforcement learning in large spaces, PRX Quantum 2, 010328 (2021)

work page 2021

[9] [9]

S. Y .-C. Chen, An introduction to quantum reinforcement learn- ing (QRL), arXiv:2409.05846

work page arXiv

[10] [10]

Meyer, C

N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, A survey on quantum reinforcement learning, arXiv:2211.03464

work page arXiv

[11] [11]

Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018)

work page 2018

[12] [12]

Wiedemann, D

S. Wiedemann, D. Hein, S. Udluft, and C. B. Mendl, Quantum policy iteration via amplitude estimation and grover search – towards quantum advantage for reinforcement learning, Trans- actions on Machine Learning Research (2023)

work page 2023

[13] [13]

T. H. Su, S. Shresthamali, and M. Kondo, Quantum framework for reinforcement learning: Integrating the markov decision process, quantum arithmetic, and trajectory search, Phys. Rev. A 111, 062421 (2025)

work page 2025

[14] [14]

Pawar, Y

A. Pawar, Y . Li, Z. Mo, Y . Guo, X. Tang, Y . Zhang, and J. Yang, QRCC: Evaluating large quantum circuits on small quantum computers through integrated qubit reuse and circuit cutting, in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Oper- ating Systems, ASPLOS ‘24 (Association for Computing M...

work page 2025

[15] [15]

Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

P. Nation, Dynamic Bernstein–Vazirani using mid-circuit re- set and measurement, https://nonhermitian.org/ posts/2021/2021-10-27-dynamic_BV.html

work page 2021

[16] [16]

J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis, Demonstra- tion of the trapped-ion quantum CCD computer architecture, Nature 592, 209 (2021)

work page 2021

[17] [17]

Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

B. Johnson, Bringing the full power of dynamic circuits to Qiskit runtime, https://www.ibm.com/quantum/ blog/quantum-dynamic-circuits

work page

[18] [18]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum reinforce- ment learning, IEEE Transactions on Systems, Man, and Cy- bernetics, Part B (Cybernetics) 38, 1207 (2008)

work page 2008

[19] [19]

Dao-Yi, C

D. Dao-Yi, C. Chun-Lin, C. Zong-Hai, and Z. Chen-Bin, Quan- tum mechanics helps in learning for more intelligent robots, Chinese Physics Letters 23, 1691 (2006)

work page 2006

[20] [20]

Chen and D.-Y

C.-L. Chen and D.-Y . Dong, Superposition-inspired reinforce- ment learning and quantum reinforcement learning, in Rein- forcement Learning, edited by C. Weber, M. Elshaw, and N. M. Mayer (IntechOpen, Rijeka, 2008), Chap. 4

work page 2008

[21] [21]

C. L. CHEN, D. Y . DONG, and Z. H. CHEN, Quantum compu- tation for action selection using reinforcement learning, Inter- national Journal of Quantum Information 04, 1071 (2006)

work page 2006

[22] [22]

D. Dong, C. Chen, J. Chu, and T.-J. Tarn, Robust quantum-inspired reinforcement learning for robot navigation, IEEE/ASME Transactions on Mechatronics 17, 86 (2012)

work page 2012

[23] [23]

Ganger and W

M. Ganger and W. Hu, Quantum multiple q-learning, Interna- tional Journal of Intelligence Science 9, 1 (2019)

work page 2019

[24] [24]

B. Cho, Y . Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversarial environ- ment, IEEE Transactions on Knowledge and Data Engineering 36, 311 (2024)

work page 2024

[25] [25]

Q. Wei, H. Ma, C. Chen, and D. Dong, Deep reinforcement learning with quantum-inspired experience replay, IEEE Trans- actions on Cybernetics 52, 9326 (2022)

work page 2022

[26] [26]

Y . Li, A. H. Aghvami, and D. Dong, Intelligent trajectory plan- ning in UA V-mounted wireless networks: A quantum-inspired reinforcement learning perspective, IEEE Wireless Communi- cations Letters 10, 1994 (2021)

work page 1994

[27] [27]

J.-A. Li, D. Dong, Z. Wei, Y . Liu, Y . Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during human decision-making, Nature Human Behaviour 4, 294 (2020)

work page 2020

[28] [28]

Niraula, J

D. Niraula, J. Jamaluddin, M. M. Matuszak, R. K. T. Haken, and I. E. Naqa, Quantum deep reinforcement learning for clini- cal decision support in oncology: application to adaptive radio- therapy, Scientific reports 11, 23545 (2021)

work page 2021

[29] [29]

Sequeira, L

A. Sequeira, L. P. Santos, and L. S. Barbosa, Policy gradients using variational quantum circuits, Quantum Machine Intelli- gence 5, 18 (2023)

work page 2023

[30] [30]

S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.- S. Goan, Variational quantum circuits for deep reinforcement learning, IEEE Access 8, 141007 (2020)

work page 2020

[31] [31]

Lockwood and M

O. Lockwood and M. Si, Reinforcement learning with quantum variational circuits, in Proceedings of the Sixteenth AAAI Con- ference on Artificial Intelligence and Interactive Digital Enter- tainment, AIIDE’20 (AAAI Press, USA, 2020), V ol. 16, pp. 245-251

work page 2020

[32] [32]

Lockwood and M

O. Lockwood and M. Si, Playing Atari with hybrid quantum- classical reinforcement learning, in NeurIPS 2020 Workshop on Pre-registration in Machine Learning (PMLR, USA, 2021), V ol. 148, pp. 285–301

work page 2020

[33] [33]

S. Wu, S. Jin, D. Wen, D. Han, and X. Wang, Quantum re- inforcement learning in continuous action space, Quantum 9, 1660 (2025)

work page 2025

[34] [34]

Skolik, S

A. Skolik, S. Jerbi, and V . Dunjko, Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning, Quantum 6, 720 (2022)

work page 2022

[35] [35]

Jerbi, C

S. Jerbi, C. Gyurik, S. C. Marshall, H. J. Briegel, and V . Dun- jko, Parametrized quantum policies for reinforcement learning, in Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ‘21 (Curran Associates Inc., Red Hook, NY , 2021), pp. 28362–28375

work page 2021

[36] [36]

Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, In- troduction to quantum reinforcement learning: Theory and pennylane-based implementation, in 2021 International Con- ference on Information and Communication Technology Con- vergence (ICTC) (IEEE, Korea, 2021), pp. 416–420

work page 2021

[37] [37]

Lan, Variational quantum soft actor-critic, arXiv:2112.11921

Q. Lan, Variational quantum soft actor-critic, arXiv:2112.11921

work page arXiv

[38] [38]

D. Wang, A. Sundaram, R. Kothari, A. Kapoor, and M. Roet- teler, Quantum algorithms for reinforcement learning with a generative model, inProceedings of the 38th International Con- ference on Machine Learning, Proceedings of Machine Learn- ing Research, V ol. 139, edited by M. Meila and T. Zhang (PMLR, 2021), pp. 10916–10926

work page 2021

[39] [39]

E. A. Cherrat, I. Kerenidis, and A. Prakash, Quantum reinforce- ment learning via policy iteration, Quantum Machine Intelli- gence 5, 30 (2023)

work page 2023

[40] [40]

F. Hua, Y . Jin, Y . Chen, S. Vittal, K. Krsulich, L. S. Bishop, J. Lapeyre, A. Javadi-Abhari, and E. Z. Zhang, CaQR: A compiler-assisted approach for qubit reuse through dynamic circuit, in Proceedings of the 28th ACM International Confer- 17 ence on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023 (Association for Compu...

work page 2023

[41] [41]

DeCross, E

M. DeCross, E. Chertkov, M. Kohagen, and M. Foss-Feig, Qubit-reuse compilation with mid-circuit measurement and re- set, Phys. Rev. X 13, 041057 (2023)

work page 2023

[42] [42]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An In- troduction (The MIT Press, Cambridge, 2018)

work page 2018

[43] [43]

Graesser and W

L. Graesser and W. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley, USA, 2020)

work page 2020

[44] [44]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

work page 2016

[45] [45]

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

S. Shalev-Shwartz, S. Shammah, and A. Shashua, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research 32, 1238 (2013)

work page 2013

[47] [47]

Silver, J

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, Mastering the game of go without human knowledge, Nature (London) 550, 354 (2017)

work page 2017

[48] [48]

Brown and T

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science 365, 885 (2019)

work page 2019

[49] [49]

Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

A. Plaat, Deep Reinforcement Learning (Springer Nature, Sin- gapore, 2022)

work page 2022

[50] [50]

Rieffel and W

E. Rieffel and W. Polak, Quantum Computing: A Gentle Intro- duction (The MIT Press, Cambridge, 2011)

work page 2011

[51] [51]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, UK, 2011)

work page 2011

[52] [52]

P. W. Shor, Algorithms for quantum computation: discrete log- arithms and factoring, in Proceedings 35th Annual Symposium on Foundations of Computer Science (IEEE, Piscataway, NJ,

work page

[53] [53]

P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997)

work page 1997

[54] [54]

Ekert and R

A. Ekert and R. Jozsa, Quantum computation and Shor’s factor- ing algorithm, Rev. Mod. Phys. 68, 733 (1996)

work page 1996

[55] [55]

L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 1996), pp. 212–219

work page 1996

[56] [56]

P. Das, A. Locharla, and C. Jones, LILLIPUT: a lightweight low-latency lookup-table decoder for near-term quantum error correction, in Proceedings of the 27th ACM International Con- ference on Architectural Support for Programming Languages and Operating Systems , ASPLOS ‘22 (Association for Com- puting Machinery, USA, 2022), pp. 541–553

work page 2022

[57] [57]

M. R. Jokar, R. Rines, G. Pasandi, H. Cong, A. Holmes, Y . Shi, M. Pedram, and F. T. Chong, DigiQ: A scalable digital con- troller for quantum computers using SFQ logic, in 2022 IEEE International Symposium on High-Performance Computer Ar- chitecture (HPCA) (IEEE Computer Society, USA, 2022), pp. 400-414

work page 2022

[58] [58]

A. Wu, G. Li, H. Zhang, G. G. Guerreschi, Y . Ding, and Y . Xie, A synthesis framework for stitching surface code with super- conducting quantum devices, in Proceedings of the 49th An- nual International Symposium on Computer Architecture, ISCA ‘22 (Association for Computing Machinery, USA, 2022), pp. 337–350

work page 2022

[59] [59]

Huang and M

Y . Huang and M. Martonosi, QDB: From Quantum Algorithms Towards Correct Quantum Programs, in9th Workshop on Eval- uation and Usability of Programming Languages and Tools (PLATEAU 2018), Open Access Series in Informatics (OA- SIcs), V ol. 67, edited by T. Barik, J. Sunshine, and S. Chasins (Schloss Dagstuhl – Leibniz-Zentrum f¨ur Informatik, Dagstuhl, Ger...

work page 2018

[60] [60]

J. Liu, G. T. Byrd, and H. Zhou, Quantum circuits for dy- namic runtime assertions in quantum computation, in Proceed- ings of the Twenty-Fifth International Conference on Archi- tectural Support for Programming Languages and Operating Systems, ASPLOS ‘20 (Association for Computing Machinery, USA, 2020), pp. 1017–1030

work page 2020

[61] [61]

P. Kaye, R. Laflamme, and M. Mosca,An Introduction to Quan- tum Computing (Oxford University Press, New York, 2007)

work page 2007

[62] [62]

Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

C. Guo, Grover’s algorithm – implementations and implica- tions, Highlights in Science, Engineering and Technology 38, 1071 (2023)

work page 2023

[63] [63]

AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

M. AbuGhanem, IBM quantum computers: evolution, perfor- mance, and future directions, J Supercomput 81, 687 (2025)

work page 2025

[64] [64]

Rudinger, G

K. Rudinger, G. J. Ribeill, L. C. Govia, M. Ware, E. Nielsen, K. Young, T. A. Ohki, R. Blume-Kohout, and T. Proctor, Char- acterizing midcircuit measurements on a superconducting qubit using gate set tomography, Phys. Rev. Appl.17, 014014 (2022)

work page 2022

[65] [65]

L. C. G. Govia, P. Jurcevic, C. J. Wood, N. Kanazawa, S. T. Merkel, and D. C. McKay, A randomized benchmarking suite for mid-circuit measurements, New Journal of Physics 25, 123016 (2023)

work page 2023

[66] [66]

Hashim, A

A. Hashim, A. Carignan-Dugas, L. Chen, C. J ¨unger, N. Fruit- wala, Y . Xu, G. Huang, J. J. Wallman, and I. Siddiqi, Quasiprobabilistic readout correction of midcircuit measure- ments for adaptive feedback via measurement randomized com- piling, PRX Quantum 6, 010307 (2025)

work page 2025