Adaptive Shot Allocation for Recursive QAOA via Reinforcement Learning

Euimin Lee; Shiho Kim

arxiv: 2605.26544 · v1 · pith:R2XFFLBOnew · submitted 2026-05-26 · 🪐 quant-ph

Adaptive Shot Allocation for Recursive QAOA via Reinforcement Learning

Euimin Lee , Shiho Kim This is my paper

Pith reviewed 2026-06-29 17:34 UTC · model grok-4.3

classification 🪐 quant-ph

keywords recursive QAOAshot allocationreinforcement learningMax-Cutadaptive measurementcombinatorial optimizationNISQ algorithms

0 comments

The pith

A reinforcement learning policy for shot allocation in recursive QAOA reduces total measurements by 36 percent relative to uniform allocation on weighted Max-Cut instances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recursive QAOA estimates pairwise correlations with shallow circuits and removes variables step by step until a classical solver can finish the remaining problem. Each elimination round consumes a variable number of measurement shots, and the cumulative cost rises with the depth of recursion. The work casts the choice of shot count at every step as a sequential decision task. A Double Q-learning agent learns to adjust allocations on the basis of local graph features while respecting an overall budget limit. On a collection of weighted graphs the learned policy lowers total shots by 36 percent, improves the shots-per-success ratio, and continues to work on graph sizes absent from its training data.

Core claim

A tabular Double Q-learning agent learns a residual policy that adjusts a hand-crafted heuristic baseline for the number of shots used at each recursive elimination step of depth-1 RQAOA. When evaluated under a fixed per-step budget cap that equalizes total allocation across strategies, the policy reduces aggregate shot count by 36 percent on diverse weighted Max-Cut instances, outperforming both uniform allocation and the static heuristic while also lowering the effective shots-per-success ratio; the improvement persists on problem sizes outside the training distribution.

What carries the argument

The tabular Double Q-learning agent that outputs per-step shot adjustments conditioned on local graph indicators and recursion depth, subject to a Lagrangian budget constraint.

Load-bearing premise

The benefit of adaptive shot allocation can be isolated because the variable-elimination rule remains unchanged and every strategy receives the same per-step budget cap.

What would settle it

Applying the trained policy to a fresh collection of larger weighted graphs and observing that total shots or success rates do not improve over the uniform baseline.

read the original abstract

Recursive QAOA (RQAOA) solves combinatorial optimization problems by using shallow quantum circuits to estimate pairwise correlations and recursively eliminate variables until a classical solver can handle the residual instance. Each elimination step requires measurement shots, and the total shot cost grows with the number of recursive stages. On near-term quantum devices, increasing shot counts can translate directly into greater exposure to hardware-level noise sources such as readout errors and decoherence, making shot-efficient execution not merely a cost-reduction measure but a factor with direct implications for solution reliability. While shot reduction has been studied broadly across NISQ algorithms, step-wise measurement control inside the recursive loop of RQAOA has received little attention. We formulate this step-wise allocation as a sequential decision problem and propose two strategies for depth-1 RQAOA on weighted Max-Cut instances. A hand-crafted heuristic assigns shots based on local indicators of step difficulty, and a tabular Double Q-learning agent learns a residual policy that adjusts this baseline under a Lagrangian-constrained objective. Both methods are evaluated under a fixed-cap fairness protocol that equalizes the per-step budget across all strategies, and the elimination rule itself is kept unchanged so that the contribution of adaptive measurement control can be isolated. On a diverse set of weighted graph instances spanning a range of sizes and structures, the heuristic reduces total shots by approximately 23% relative to uniform allocation, and the RL policy achieves a 36% reduction with a lower effective shots per success ratio than both baselines. The improvement persists on problem sizes not seen during training, suggesting that reinforcement learning can discover efficient, instance-adaptive measurement strategies in recursive quantum optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RL cuts shots 36% in depth-1 RQAOA and generalizes to unseen sizes, but the numbers lack any statistical backing and the isolation claim is questionable.

read the letter

The main takeaway is that applying tabular Double Q-learning to decide shot counts step by step inside RQAOA recursion gives a 36% shot reduction on weighted Max-Cut, beating the 23% from a simple heuristic, and the policy transfers to larger instances. That's the concrete result.

What the paper does well is set up the problem as a sequential decision with a Lagrangian constraint and test generalization. Keeping the elimination rule fixed and using a fixed per-step budget is a reasonable way to try to isolate the allocation effect.

The soft spots are bigger than minor. The abstract gives the percentage reductions but skips any information on the number of graphs tested, variance, or definition of success. Without that the numbers are hard to trust. The isolation also looks shaky because fewer shots raise the variance of the pairwise correlation estimates that drive elimination. Different paths could explain part of the difference in total shots and success ratio, not just the allocation policy itself.

This paper is for researchers focused on practical shot reduction in recursive quantum algorithms for combinatorial problems. It won't move the needle for general quantum computing or RL theory.

I would not recommend peer review at this stage. The empirical claims need statistical support and a check on whether the recursion trajectories differ before it is worth a referee's time.

Referee Report

2 major / 0 minor

Summary. The manuscript formulates step-wise shot allocation in depth-1 Recursive QAOA as a sequential decision problem and introduces a hand-crafted heuristic plus a tabular Double Q-learning policy that adjusts shot counts under a Lagrangian constraint. Both are tested against uniform allocation on weighted Max-Cut instances under a fixed-cap fairness protocol that equalizes per-step budgets while leaving the elimination rule unchanged; the abstract reports that the heuristic yields an approximately 23% total-shot reduction and the RL policy a 36% reduction together with a lower shots-per-success ratio, with the gains persisting on problem sizes unseen during training.

Significance. If the reported reductions are statistically robust and the attribution to adaptive allocation can be cleanly isolated, the work would demonstrate a practical route to lowering shot overhead in recursive quantum optimization algorithms, directly relevant to noise exposure on NISQ hardware. The persistence of gains on out-of-distribution sizes and the explicit use of an RL residual policy are positive features that, if substantiated, would strengthen the case for learned measurement strategies in recursive quantum routines.

major comments (2)

[Abstract] Abstract: the central performance claims (23% and 36% reductions, improved shots-per-success ratio) are stated without any accompanying information on the number of graph instances, statistical significance tests, error bars, variance across runs, or the precise operational definition of 'success'. This absence prevents verification that the empirical data support the headline numbers.
[Abstract / methods (fixed-cap protocol)] Fixed-cap fairness protocol and isolation argument (stated in the abstract and presumably elaborated in the methods): the manuscript asserts that keeping the elimination rule fixed and equalizing per-step budgets isolates the effect of adaptive shot counts. However, allocating fewer shots necessarily raises the sampling variance of the pairwise correlation estimates that are the direct inputs to the elimination rule; noisier estimates can therefore produce different elimination sequences even under an identical rule, so differences in total shots and success ratio may partly reflect altered recursion trajectories rather than measurement efficiency alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (23% and 36% reductions, improved shots-per-success ratio) are stated without any accompanying information on the number of graph instances, statistical significance tests, error bars, variance across runs, or the precise operational definition of 'success'. This absence prevents verification that the empirical data support the headline numbers.

Authors: We agree that the abstract would benefit from additional context on the supporting empirical evidence. In the revised manuscript we will expand the abstract to state the number of weighted Max-Cut instances evaluated, the number of independent runs per instance, the use of error bars, and a concise definition of success (reaching the target cut value within the recursion). The full statistical analysis, including variance measures and any significance tests, remains in the Results section. revision: yes
Referee: [Abstract / methods (fixed-cap protocol)] Fixed-cap fairness protocol and isolation argument (stated in the abstract and presumably elaborated in the methods): the manuscript asserts that keeping the elimination rule fixed and equalizing per-step budgets isolates the effect of adaptive shot counts. However, allocating fewer shots necessarily raises the sampling variance of the pairwise correlation estimates that are the direct inputs to the elimination rule; noisier estimates can therefore produce different elimination sequences even under an identical rule, so differences in total shots and success ratio may partly reflect altered recursion trajectories rather than measurement efficiency alone.

Authors: We thank the referee for this observation. It is correct that lower per-step shot counts increase the variance of the correlation estimates that feed the elimination rule, and therefore can alter the recursion trajectory even when the rule itself is unchanged. Our fixed-cap protocol equalizes the per-step budget across strategies while preserving the elimination rule, but does not constrain the resulting path. We will revise the manuscript to explicitly acknowledge this limitation of the isolation argument and to clarify that the reported reductions reflect the net effect of adaptive allocation on both measurement cost and the induced elimination sequences. Should the referee consider it necessary, we can add a controlled experiment that fixes the elimination order to isolate the pure measurement-efficiency contribution. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance metrics from RL training and testing

full rationale

The paper's central claims are measured shot reductions (23% heuristic, 36% RL) on held-out graph instances under a fixed experimental protocol. These are direct outputs of running the described allocation policies against uniform baselines; they do not reduce by any equation in the manuscript to quantities defined in terms of themselves or to parameters fitted on the same test data. The fixed-cap fairness protocol and unchanged elimination rule are explicit experimental controls, not a derivation step. No self-citation chain, ansatz smuggling, or self-definitional loop appears in the provided text. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are identifiable from the provided text.

axioms (1)

domain assumption Standard quantum mechanics and NISQ noise models apply to the device behavior during measurement.
Implicit background for any discussion of shot counts and hardware noise in quantum optimization.

pith-pipeline@v0.9.1-grok · 5816 in / 1239 out tokens · 48702 ms · 2026-06-29T17:34:17.741733+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 3 canonical work pages

[1]

Quantum Sci

Moll, N., Barkoutsos, P., Bishop, L.S., Chow, J.M., Cross, A., Egger, D.J., Filipp, S., Fuhrer, A., Gambetta, J.M., Ganzhorn, M., Mezzacapo, A., Salis, G., Smolin, J., Tavernelli, I., Temme, K.: Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci. Technol.3, 030503 (2018)

2018
[2]

Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S.C., Endo, S., Fujii, K., McClean, J.R., Mitarai, K., Yuan, X., Cincio, L., Coles, P.J.: Variational quantum algorithms. Nat. Rev. Phys.3, 625–644 (2021)

2021
[3]

Lee, E., Lee, S., Kim, S.: Quantum walk based Monte Carlo simulation for photon interaction cross sections. Phys. Rev. D111, 116001 (2025) https://doi.org/10. 1103/PhysRevD.111.116001 21

2025
[4]

Bauer, B., Bravyi, S., Motta, M., Chan, G.K.-L.: Quantum algorithms for quan- tum chemistry and quantum materials science. Chem. Rev.120, 12685–12717 (2020)

2020
[5]

Nature549, 195–202 (2017)

Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature549, 195–202 (2017)

2017
[6]

Cambridge University Press, Cambridge (2010)

Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2010)

2010
[7]

Quantum2, 79 (2018)

Preskill, J.: Quantum computing in the NISQ era and beyond. Quantum2, 79 (2018)

2018
[8]

Peruzzo, A., McClean, J., Shadbolt, P., Yung, M.-H., Zhou, X.-Q., Love, P.J., Aspuru-Guzik, A., O’Brien, J.L.: A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.5, 4213 (2014)

2014
[9]

McClean, J.R., Romero, J., Babbush, R., Aspuru-Guzik, A.: The theory of variational hybrid quantum-classical algorithms. New J. Phys.18, 023023 (2016)

2016
[10]

Nature549, 242–246 (2017)

Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature549, 242–246 (2017)

2017
[11]

Bharti, K., Cervera-Lierta, A., Kyaw, T.H., Haug, T., Alperin-Lea, S., Anand, A., Degroote, M., Heimonen, H., Kottmann, J.S., Menke, T., Mok, W.-K., Sim, S., Kwek, L.-C., Aspuru-Guzik, A.: Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys.94, 015004 (2022)

2022
[12]

Tilly, J., Chen, H., Cao, S., Picozzi, D., Setia, K., Li, Y., Grant, E., Wossnig, L., Rungger, I., Booth, G.H., Tennyson, J.: The variational quantum eigensolver: a review of methods and best practices. Phys. Rep.986, 1–128 (2022)

2022
[13]

Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm (2014)

2014
[14]

Algorithms12, 34 (2019)

Hadfield, S., Wang, Z., O’Gorman, B., Rieffel, E.G., Venturelli, D., Biswas, R.: From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms12, 34 (2019)

2019
[15]

Wang, Z., Hadfield, S., Jiang, Z., Rieffel, E.G.: Quantum approximate opti- mization algorithm for MaxCut: a fermionic view. Phys. Rev. A97, 022304 (2018)

2018
[16]

Zhou, L., Wang, S.-T., Choi, S., Pichler, H., Lukin, M.D.: Quantum approximate optimization algorithm: performance, mechanism, and implementation on near- term devices. Phys. Rev. X10, 021067 (2020) 22

2020
[17]

Crooks, G.E.: Performance of the quantum approximate optimization algorithm on the maximum cut problem (2018)

2018
[18]

Quantum Inf

Willsch, D., Willsch, M., Jin, F., De Raedt, H., Michielsen, K.: Benchmarking the quantum approximate optimization algorithm. Quantum Inf. Process.19, 197 (2020)

2020
[19]

Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: a typical case (2020)

2020
[20]

Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: worst case examples (2020)

2020
[21]

Quantum5, 437 (2021)

Marwaha, K.: Local classical MAX-CUT algorithm outperformsp= 2 QAOA on high-girth regular graphs. Quantum5, 437 (2021)

2021
[22]

Hastings, M.B.: Classical and quantum bounded depth approximation algorithms (2019)

2019
[23]

Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Obstacles to state preparation and variational optimization from symmetry protection. Phys. Rev. Lett.125, 260505 (2020)

2020
[24]

McClean, J.R., Boixo, S., Smelyanskiy, V.N., Babbush, R., Neven, H.: Barren plateaus in quantum neural network training landscapes. Nat. Commun.9, 4812 (2018)

2018
[25]

Cerezo, M., Sone, A., Volkoff, T., Cincio, L., Coles, P.J.: Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 1791 (2021)

2021
[26]

Bittel, L., Kliesch, M.: Training variational quantum algorithms is NP-hard. Phys. Rev. Lett.127, 120502 (2021)

2021
[27]

Quantum6, 678 (2022)

Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Hybrid quantum-classical algorithms for approximate graph coloring. Quantum6, 678 (2022)

2022
[28]

Bae, E., Lee, S.: Recursive QAOA outperforms the original QAOA for the MAX- CUT problem on complete graphs (2023)

2023
[29]

EPJ Quantum Technol.11, 1 (2024)

Patel, Y.J., Jerbi, S., B¨ ack, T., Dunjko, V.: Reinforcement learning assisted recursive QAOA. EPJ Quantum Technol.11, 1 (2024)

2024
[30]

PRX Quantum5, 020327 (2024) 23

Finˇ zgar, J.R., Kerschbaumer, A., Schuetz, M.J.A., Mendl, C.B., Katzgraber, H.G.: Quantum-informed recursive optimization algorithms. PRX Quantum5, 020327 (2024) 23

2024
[31]

Brady, L.T., Hadfield, S.: Iterative quantum algorithms for maximum indepen- dent set. Phys. Rev. A110, 052435 (2024)

2024
[32]

Temme, K., Bravyi, S., Gambetta, J.M.: Error mitigation for short-depth quantum circuits. Phys. Rev. Lett.119, 180509 (2017)

2017
[33]

Endo, S., Cai, Z., Benjamin, S.C., Yuan, X.: Hybrid quantum-classical algorithms and quantum error mitigation. J. Phys. Soc. Jpn.90, 032001 (2021)

2021
[34]

Quantum4, 263 (2020)

K¨ ubler, J.M., Arrasmith, A., Cincio, L., Coles, P.J.: An adaptive optimizer for measurement-frugal variational algorithms. Quantum4, 263 (2020)

2020
[35]

Arrasmith, A., Cincio, L., Somma, R.D., Coles, P.J.: Operator sampling for shot- frugal optimization in variational algorithms (2020)

2020
[36]

Gu, A., Lowe, A., Dub, P.A., Coles, P.J., Arrasmith, A.: Adaptive shot allocation for fast convergence in variational quantum algorithms (2021)

2021
[37]

Ito, K., Mizukami, W., Fujii, K.: Latency-aware adaptive shot allocation for variational quantum algorithms (2023)

2023
[38]

Quantum4, 314 (2020)

Sweke, R., Wilde, F., Meyer, J., Schuld, M., Faehrmann, P.K., Meynard- Piganeau, B., Eisert, J.: Stochastic gradient descent for hybrid quantum-classical optimization. Quantum4, 314 (2020)

2020
[39]

Quantum 4, 269 (2020)

Stokes, J., Izaac, J., Killoran, N., Carleo, G.: Quantum natural gradient. Quantum 4, 269 (2020)

2020
[40]

IEEE Trans

Spall, J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst.34, 817–823 (1998)

1998
[41]

Zhao, A., Rubin, N.C., Miyake, A.: Measurement reduction in variational quantum algorithms. Phys. Rev. Lett.127, 110504 (2021)

2021
[42]

npj Quantum Inf.7, 23 (2021)

Huggins, W.J., McClean, J.R., Rubin, N., Jiang, Z., Wiebe, N., Whaley, K.B., Babbush, R.: Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers. npj Quantum Inf.7, 23 (2021)

2021
[43]

npj Quantum Inf.6, 34 (2020)

Gokhale, P., Angiuli, O., Ding, Y., Gui, K., Tomesh, T., Suchara, M., Martonosi, M., Chong, F.T.: Minimizing state preparations in variational quantum eigen- solver by partitioning into commuting families. npj Quantum Inf.6, 34 (2020)

2020
[44]

Sci.10, 3746–3755 (2019)

Izmaylov, A.F., Yen, T.-C., Ryabinkin, I.G.: Revising the measurement process in the variational quantum eigensolver: is it possible to reduce the number of separately measured operators? Chem. Sci.10, 3746–3755 (2019)

2019
[45]

Verteletskyi, V., Yen, T.-C., Izmaylov, A.F.: Measurement optimization in the variational quantum eigensolver using a minimum clique cover. J. Chem. Phys. 24 152, 124114 (2020)

2020
[46]

Quantum5, 385 (2021)

Crawford, O., Straaten, B., Wang, D., Parks, T., Campbell, E., Brierley, S.: Effi- cient quantum measurement of Pauli operators in the presence of finite sampling error. Quantum5, 385 (2021)

2021
[47]

Liang, S., Zhu, L., Liu, X., Yang, C., Li, X.: Artificial-intelligence-driven shot reduction in quantum measurement. Chem. Phys. Rev.5, 041403 (2024) https: //doi.org/10.1063/5.0219663

work page doi:10.1063/5.0219663 2024
[48]

Wauters, M.M., El-Araby, E., Amin, M.H.: Reinforcement-learning-assisted quan- tum optimization. Phys. Rev. Res.2, 033446 (2020)

2020
[49]

In: Proceedings of Machine Learning Research, vol

Yao, J., Bukov, M., Lin, L.: Policy gradient based quantum approximate opti- mization algorithm. In: Proceedings of Machine Learning Research, vol. 107, pp. 605–634 (2020)

2020
[50]

Khairy, S., Shaydulin, R., Cincio, L., Alexeev, Y., Balaprakash, P.: Reinforcement- learning-based variational quantum circuits optimization for combinatorial prob- lems (2020)

2020
[51]

In: Advances in Neural Information Processing Systems 34, pp

Ostaszewski, M., Trenkwalder, L.M., Masarczyk, W., Scerri, E., Dunjko, V.: Rein- forcement learning for optimization of variational quantum circuit architectures. In: Advances in Neural Information Processing Systems 34, pp. 18182–18194. Curran Associates, Red Hook, NY (2021)

2021
[52]

EPJ Quantum Technol

Skolik, A., Mangini, S., B¨ ack, T., Macchiavello, C., Dunjko, V.: Robustness of quantum reinforcement learning under hardware errors. EPJ Quantum Technol. 10, 12 (2023)

2023
[53]

Altmann, P., Stein, J., K¨ olle, M., B˘ arligea, A., Gabor, T., Phan, T., Feld, S., Linnhoff-Popien, C.: Challenges for reinforcement learning in quantum circuit design (2024)

2024
[54]

Lucas, A.: Ising formulations of many NP problems. Front. Phys.2, 5 (2014)

2014
[55]

Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for max- imum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)

1995
[56]

Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b

Ozaeta, A., Dam, W., McMahon, P.L.: Expectation values from the single-layer quantum approximate optimization algorithm on Ising problems. Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b

work page doi:10.1088/2058-9565/ 2022
[57]

Kuo, E.-J., Fang, Y.-L.L., Chen, S.Y.-C.: Quantum architecture search via deep reinforcement learning (2021) 25

2021
[58]

F¨ osel, T., Niu, M.Y., Marquardt, F., Li, L.: Quantum circuit optimization with deep reinforcement learning (2021)

2021
[59]

F¨ osel, T., Tighineanu, P., Weiss, T., Marquardt, F.: Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018)

2018
[60]

npj Quantum Inf.5, 33 (2019)

Niu, M.Y., Boixo, S., Smelyanskiy, V.N., Neven, H.: Universal quantum control through deep reinforcement learning. npj Quantum Inf.5, 33 (2019)

2019
[61]

Bukov, M., Day, A.G.R., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018)

2018
[62]

MIT Press, Cambridge, MA (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, MA (2018)

2018
[63]

Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn.8, 279–292 (1992)

1992
[64]

In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A

Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Red Hook, NY (2010)

2010
[65]

Chapman & Hall/CRC, Boca Raton, FL (1999)

Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL (1999)

1999
[66]

In: Proceedings of the 34th International Conference on Machine Learning

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning. Pro- ceedings of Machine Learning Research, pp. 22–31. PMLR, Sydney, Australia (2017)

2017
[67]

In: International Conference on Learning Representations (2019)

Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimiza- tion. In: International Conference on Learning Representations (2019)

2019
[68]

Zenodo (2019)

Aleksandrowicz, G., et al.: Qiskit: an open-source framework for quantum computing. Zenodo (2019). https://doi.org/10.5281/zenodo.2562111

work page doi:10.5281/zenodo.2562111 2019
[69]

Nature567, 491–495 (2019) 26

Kandala, A., Temme, K., C´ orcoles, A.D., Mezzacapo, A., Chow, J.M., Gam- betta, J.M.: Error mitigation extends the computational reach of a noisy quantum processor. Nature567, 491–495 (2019) 26

2019

[1] [1]

Quantum Sci

Moll, N., Barkoutsos, P., Bishop, L.S., Chow, J.M., Cross, A., Egger, D.J., Filipp, S., Fuhrer, A., Gambetta, J.M., Ganzhorn, M., Mezzacapo, A., Salis, G., Smolin, J., Tavernelli, I., Temme, K.: Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci. Technol.3, 030503 (2018)

2018

[2] [2]

Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S.C., Endo, S., Fujii, K., McClean, J.R., Mitarai, K., Yuan, X., Cincio, L., Coles, P.J.: Variational quantum algorithms. Nat. Rev. Phys.3, 625–644 (2021)

2021

[3] [3]

Lee, E., Lee, S., Kim, S.: Quantum walk based Monte Carlo simulation for photon interaction cross sections. Phys. Rev. D111, 116001 (2025) https://doi.org/10. 1103/PhysRevD.111.116001 21

2025

[4] [4]

Bauer, B., Bravyi, S., Motta, M., Chan, G.K.-L.: Quantum algorithms for quan- tum chemistry and quantum materials science. Chem. Rev.120, 12685–12717 (2020)

2020

[5] [5]

Nature549, 195–202 (2017)

Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature549, 195–202 (2017)

2017

[6] [6]

Cambridge University Press, Cambridge (2010)

Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2010)

2010

[7] [7]

Quantum2, 79 (2018)

Preskill, J.: Quantum computing in the NISQ era and beyond. Quantum2, 79 (2018)

2018

[8] [8]

Peruzzo, A., McClean, J., Shadbolt, P., Yung, M.-H., Zhou, X.-Q., Love, P.J., Aspuru-Guzik, A., O’Brien, J.L.: A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.5, 4213 (2014)

2014

[9] [9]

McClean, J.R., Romero, J., Babbush, R., Aspuru-Guzik, A.: The theory of variational hybrid quantum-classical algorithms. New J. Phys.18, 023023 (2016)

2016

[10] [10]

Nature549, 242–246 (2017)

Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature549, 242–246 (2017)

2017

[11] [11]

Bharti, K., Cervera-Lierta, A., Kyaw, T.H., Haug, T., Alperin-Lea, S., Anand, A., Degroote, M., Heimonen, H., Kottmann, J.S., Menke, T., Mok, W.-K., Sim, S., Kwek, L.-C., Aspuru-Guzik, A.: Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys.94, 015004 (2022)

2022

[12] [12]

Tilly, J., Chen, H., Cao, S., Picozzi, D., Setia, K., Li, Y., Grant, E., Wossnig, L., Rungger, I., Booth, G.H., Tennyson, J.: The variational quantum eigensolver: a review of methods and best practices. Phys. Rep.986, 1–128 (2022)

2022

[13] [13]

Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm (2014)

2014

[14] [14]

Algorithms12, 34 (2019)

Hadfield, S., Wang, Z., O’Gorman, B., Rieffel, E.G., Venturelli, D., Biswas, R.: From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms12, 34 (2019)

2019

[15] [15]

Wang, Z., Hadfield, S., Jiang, Z., Rieffel, E.G.: Quantum approximate opti- mization algorithm for MaxCut: a fermionic view. Phys. Rev. A97, 022304 (2018)

2018

[16] [16]

Zhou, L., Wang, S.-T., Choi, S., Pichler, H., Lukin, M.D.: Quantum approximate optimization algorithm: performance, mechanism, and implementation on near- term devices. Phys. Rev. X10, 021067 (2020) 22

2020

[17] [17]

Crooks, G.E.: Performance of the quantum approximate optimization algorithm on the maximum cut problem (2018)

2018

[18] [18]

Quantum Inf

Willsch, D., Willsch, M., Jin, F., De Raedt, H., Michielsen, K.: Benchmarking the quantum approximate optimization algorithm. Quantum Inf. Process.19, 197 (2020)

2020

[19] [19]

Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: a typical case (2020)

2020

[20] [20]

Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: worst case examples (2020)

2020

[21] [21]

Quantum5, 437 (2021)

Marwaha, K.: Local classical MAX-CUT algorithm outperformsp= 2 QAOA on high-girth regular graphs. Quantum5, 437 (2021)

2021

[22] [22]

Hastings, M.B.: Classical and quantum bounded depth approximation algorithms (2019)

2019

[23] [23]

Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Obstacles to state preparation and variational optimization from symmetry protection. Phys. Rev. Lett.125, 260505 (2020)

2020

[24] [24]

McClean, J.R., Boixo, S., Smelyanskiy, V.N., Babbush, R., Neven, H.: Barren plateaus in quantum neural network training landscapes. Nat. Commun.9, 4812 (2018)

2018

[25] [25]

Cerezo, M., Sone, A., Volkoff, T., Cincio, L., Coles, P.J.: Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 1791 (2021)

2021

[26] [26]

Bittel, L., Kliesch, M.: Training variational quantum algorithms is NP-hard. Phys. Rev. Lett.127, 120502 (2021)

2021

[27] [27]

Quantum6, 678 (2022)

Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Hybrid quantum-classical algorithms for approximate graph coloring. Quantum6, 678 (2022)

2022

[28] [28]

Bae, E., Lee, S.: Recursive QAOA outperforms the original QAOA for the MAX- CUT problem on complete graphs (2023)

2023

[29] [29]

EPJ Quantum Technol.11, 1 (2024)

Patel, Y.J., Jerbi, S., B¨ ack, T., Dunjko, V.: Reinforcement learning assisted recursive QAOA. EPJ Quantum Technol.11, 1 (2024)

2024

[30] [30]

PRX Quantum5, 020327 (2024) 23

Finˇ zgar, J.R., Kerschbaumer, A., Schuetz, M.J.A., Mendl, C.B., Katzgraber, H.G.: Quantum-informed recursive optimization algorithms. PRX Quantum5, 020327 (2024) 23

2024

[31] [31]

Brady, L.T., Hadfield, S.: Iterative quantum algorithms for maximum indepen- dent set. Phys. Rev. A110, 052435 (2024)

2024

[32] [32]

Temme, K., Bravyi, S., Gambetta, J.M.: Error mitigation for short-depth quantum circuits. Phys. Rev. Lett.119, 180509 (2017)

2017

[33] [33]

Endo, S., Cai, Z., Benjamin, S.C., Yuan, X.: Hybrid quantum-classical algorithms and quantum error mitigation. J. Phys. Soc. Jpn.90, 032001 (2021)

2021

[34] [34]

Quantum4, 263 (2020)

K¨ ubler, J.M., Arrasmith, A., Cincio, L., Coles, P.J.: An adaptive optimizer for measurement-frugal variational algorithms. Quantum4, 263 (2020)

2020

[35] [35]

Arrasmith, A., Cincio, L., Somma, R.D., Coles, P.J.: Operator sampling for shot- frugal optimization in variational algorithms (2020)

2020

[36] [36]

Gu, A., Lowe, A., Dub, P.A., Coles, P.J., Arrasmith, A.: Adaptive shot allocation for fast convergence in variational quantum algorithms (2021)

2021

[37] [37]

Ito, K., Mizukami, W., Fujii, K.: Latency-aware adaptive shot allocation for variational quantum algorithms (2023)

2023

[38] [38]

Quantum4, 314 (2020)

Sweke, R., Wilde, F., Meyer, J., Schuld, M., Faehrmann, P.K., Meynard- Piganeau, B., Eisert, J.: Stochastic gradient descent for hybrid quantum-classical optimization. Quantum4, 314 (2020)

2020

[39] [39]

Quantum 4, 269 (2020)

Stokes, J., Izaac, J., Killoran, N., Carleo, G.: Quantum natural gradient. Quantum 4, 269 (2020)

2020

[40] [40]

IEEE Trans

Spall, J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst.34, 817–823 (1998)

1998

[41] [41]

Zhao, A., Rubin, N.C., Miyake, A.: Measurement reduction in variational quantum algorithms. Phys. Rev. Lett.127, 110504 (2021)

2021

[42] [42]

npj Quantum Inf.7, 23 (2021)

Huggins, W.J., McClean, J.R., Rubin, N., Jiang, Z., Wiebe, N., Whaley, K.B., Babbush, R.: Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers. npj Quantum Inf.7, 23 (2021)

2021

[43] [43]

npj Quantum Inf.6, 34 (2020)

Gokhale, P., Angiuli, O., Ding, Y., Gui, K., Tomesh, T., Suchara, M., Martonosi, M., Chong, F.T.: Minimizing state preparations in variational quantum eigen- solver by partitioning into commuting families. npj Quantum Inf.6, 34 (2020)

2020

[44] [44]

Sci.10, 3746–3755 (2019)

Izmaylov, A.F., Yen, T.-C., Ryabinkin, I.G.: Revising the measurement process in the variational quantum eigensolver: is it possible to reduce the number of separately measured operators? Chem. Sci.10, 3746–3755 (2019)

2019

[45] [45]

Verteletskyi, V., Yen, T.-C., Izmaylov, A.F.: Measurement optimization in the variational quantum eigensolver using a minimum clique cover. J. Chem. Phys. 24 152, 124114 (2020)

2020

[46] [46]

Quantum5, 385 (2021)

Crawford, O., Straaten, B., Wang, D., Parks, T., Campbell, E., Brierley, S.: Effi- cient quantum measurement of Pauli operators in the presence of finite sampling error. Quantum5, 385 (2021)

2021

[47] [47]

Liang, S., Zhu, L., Liu, X., Yang, C., Li, X.: Artificial-intelligence-driven shot reduction in quantum measurement. Chem. Phys. Rev.5, 041403 (2024) https: //doi.org/10.1063/5.0219663

work page doi:10.1063/5.0219663 2024

[48] [48]

Wauters, M.M., El-Araby, E., Amin, M.H.: Reinforcement-learning-assisted quan- tum optimization. Phys. Rev. Res.2, 033446 (2020)

2020

[49] [49]

In: Proceedings of Machine Learning Research, vol

Yao, J., Bukov, M., Lin, L.: Policy gradient based quantum approximate opti- mization algorithm. In: Proceedings of Machine Learning Research, vol. 107, pp. 605–634 (2020)

2020

[50] [50]

Khairy, S., Shaydulin, R., Cincio, L., Alexeev, Y., Balaprakash, P.: Reinforcement- learning-based variational quantum circuits optimization for combinatorial prob- lems (2020)

2020

[51] [51]

In: Advances in Neural Information Processing Systems 34, pp

Ostaszewski, M., Trenkwalder, L.M., Masarczyk, W., Scerri, E., Dunjko, V.: Rein- forcement learning for optimization of variational quantum circuit architectures. In: Advances in Neural Information Processing Systems 34, pp. 18182–18194. Curran Associates, Red Hook, NY (2021)

2021

[52] [52]

EPJ Quantum Technol

Skolik, A., Mangini, S., B¨ ack, T., Macchiavello, C., Dunjko, V.: Robustness of quantum reinforcement learning under hardware errors. EPJ Quantum Technol. 10, 12 (2023)

2023

[53] [53]

Altmann, P., Stein, J., K¨ olle, M., B˘ arligea, A., Gabor, T., Phan, T., Feld, S., Linnhoff-Popien, C.: Challenges for reinforcement learning in quantum circuit design (2024)

2024

[54] [54]

Lucas, A.: Ising formulations of many NP problems. Front. Phys.2, 5 (2014)

2014

[55] [55]

Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for max- imum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)

1995

[56] [56]

Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b

Ozaeta, A., Dam, W., McMahon, P.L.: Expectation values from the single-layer quantum approximate optimization algorithm on Ising problems. Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b

work page doi:10.1088/2058-9565/ 2022

[57] [57]

Kuo, E.-J., Fang, Y.-L.L., Chen, S.Y.-C.: Quantum architecture search via deep reinforcement learning (2021) 25

2021

[58] [58]

F¨ osel, T., Niu, M.Y., Marquardt, F., Li, L.: Quantum circuit optimization with deep reinforcement learning (2021)

2021

[59] [59]

F¨ osel, T., Tighineanu, P., Weiss, T., Marquardt, F.: Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018)

2018

[60] [60]

npj Quantum Inf.5, 33 (2019)

Niu, M.Y., Boixo, S., Smelyanskiy, V.N., Neven, H.: Universal quantum control through deep reinforcement learning. npj Quantum Inf.5, 33 (2019)

2019

[61] [61]

Bukov, M., Day, A.G.R., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018)

2018

[62] [62]

MIT Press, Cambridge, MA (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, MA (2018)

2018

[63] [63]

Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn.8, 279–292 (1992)

1992

[64] [64]

In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A

Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Red Hook, NY (2010)

2010

[65] [65]

Chapman & Hall/CRC, Boca Raton, FL (1999)

Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL (1999)

1999

[66] [66]

In: Proceedings of the 34th International Conference on Machine Learning

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning. Pro- ceedings of Machine Learning Research, pp. 22–31. PMLR, Sydney, Australia (2017)

2017

[67] [67]

In: International Conference on Learning Representations (2019)

Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimiza- tion. In: International Conference on Learning Representations (2019)

2019

[68] [68]

Zenodo (2019)

Aleksandrowicz, G., et al.: Qiskit: an open-source framework for quantum computing. Zenodo (2019). https://doi.org/10.5281/zenodo.2562111

work page doi:10.5281/zenodo.2562111 2019

[69] [69]

Nature567, 491–495 (2019) 26

Kandala, A., Temme, K., C´ orcoles, A.D., Mezzacapo, A., Chow, J.M., Gam- betta, J.M.: Error mitigation extends the computational reach of a noisy quantum processor. Nature567, 491–495 (2019) 26

2019