Adaptive Shot Allocation for Recursive QAOA via Reinforcement Learning
Pith reviewed 2026-06-29 17:34 UTC · model grok-4.3
The pith
A reinforcement learning policy for shot allocation in recursive QAOA reduces total measurements by 36 percent relative to uniform allocation on weighted Max-Cut instances.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A tabular Double Q-learning agent learns a residual policy that adjusts a hand-crafted heuristic baseline for the number of shots used at each recursive elimination step of depth-1 RQAOA. When evaluated under a fixed per-step budget cap that equalizes total allocation across strategies, the policy reduces aggregate shot count by 36 percent on diverse weighted Max-Cut instances, outperforming both uniform allocation and the static heuristic while also lowering the effective shots-per-success ratio; the improvement persists on problem sizes outside the training distribution.
What carries the argument
The tabular Double Q-learning agent that outputs per-step shot adjustments conditioned on local graph indicators and recursion depth, subject to a Lagrangian budget constraint.
Load-bearing premise
The benefit of adaptive shot allocation can be isolated because the variable-elimination rule remains unchanged and every strategy receives the same per-step budget cap.
What would settle it
Applying the trained policy to a fresh collection of larger weighted graphs and observing that total shots or success rates do not improve over the uniform baseline.
read the original abstract
Recursive QAOA (RQAOA) solves combinatorial optimization problems by using shallow quantum circuits to estimate pairwise correlations and recursively eliminate variables until a classical solver can handle the residual instance. Each elimination step requires measurement shots, and the total shot cost grows with the number of recursive stages. On near-term quantum devices, increasing shot counts can translate directly into greater exposure to hardware-level noise sources such as readout errors and decoherence, making shot-efficient execution not merely a cost-reduction measure but a factor with direct implications for solution reliability. While shot reduction has been studied broadly across NISQ algorithms, step-wise measurement control inside the recursive loop of RQAOA has received little attention. We formulate this step-wise allocation as a sequential decision problem and propose two strategies for depth-1 RQAOA on weighted Max-Cut instances. A hand-crafted heuristic assigns shots based on local indicators of step difficulty, and a tabular Double Q-learning agent learns a residual policy that adjusts this baseline under a Lagrangian-constrained objective. Both methods are evaluated under a fixed-cap fairness protocol that equalizes the per-step budget across all strategies, and the elimination rule itself is kept unchanged so that the contribution of adaptive measurement control can be isolated. On a diverse set of weighted graph instances spanning a range of sizes and structures, the heuristic reduces total shots by approximately 23% relative to uniform allocation, and the RL policy achieves a 36% reduction with a lower effective shots per success ratio than both baselines. The improvement persists on problem sizes not seen during training, suggesting that reinforcement learning can discover efficient, instance-adaptive measurement strategies in recursive quantum optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formulates step-wise shot allocation in depth-1 Recursive QAOA as a sequential decision problem and introduces a hand-crafted heuristic plus a tabular Double Q-learning policy that adjusts shot counts under a Lagrangian constraint. Both are tested against uniform allocation on weighted Max-Cut instances under a fixed-cap fairness protocol that equalizes per-step budgets while leaving the elimination rule unchanged; the abstract reports that the heuristic yields an approximately 23% total-shot reduction and the RL policy a 36% reduction together with a lower shots-per-success ratio, with the gains persisting on problem sizes unseen during training.
Significance. If the reported reductions are statistically robust and the attribution to adaptive allocation can be cleanly isolated, the work would demonstrate a practical route to lowering shot overhead in recursive quantum optimization algorithms, directly relevant to noise exposure on NISQ hardware. The persistence of gains on out-of-distribution sizes and the explicit use of an RL residual policy are positive features that, if substantiated, would strengthen the case for learned measurement strategies in recursive quantum routines.
major comments (2)
- [Abstract] Abstract: the central performance claims (23% and 36% reductions, improved shots-per-success ratio) are stated without any accompanying information on the number of graph instances, statistical significance tests, error bars, variance across runs, or the precise operational definition of 'success'. This absence prevents verification that the empirical data support the headline numbers.
- [Abstract / methods (fixed-cap protocol)] Fixed-cap fairness protocol and isolation argument (stated in the abstract and presumably elaborated in the methods): the manuscript asserts that keeping the elimination rule fixed and equalizing per-step budgets isolates the effect of adaptive shot counts. However, allocating fewer shots necessarily raises the sampling variance of the pairwise correlation estimates that are the direct inputs to the elimination rule; noisier estimates can therefore produce different elimination sequences even under an identical rule, so differences in total shots and success ratio may partly reflect altered recursion trajectories rather than measurement efficiency alone.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims (23% and 36% reductions, improved shots-per-success ratio) are stated without any accompanying information on the number of graph instances, statistical significance tests, error bars, variance across runs, or the precise operational definition of 'success'. This absence prevents verification that the empirical data support the headline numbers.
Authors: We agree that the abstract would benefit from additional context on the supporting empirical evidence. In the revised manuscript we will expand the abstract to state the number of weighted Max-Cut instances evaluated, the number of independent runs per instance, the use of error bars, and a concise definition of success (reaching the target cut value within the recursion). The full statistical analysis, including variance measures and any significance tests, remains in the Results section. revision: yes
-
Referee: [Abstract / methods (fixed-cap protocol)] Fixed-cap fairness protocol and isolation argument (stated in the abstract and presumably elaborated in the methods): the manuscript asserts that keeping the elimination rule fixed and equalizing per-step budgets isolates the effect of adaptive shot counts. However, allocating fewer shots necessarily raises the sampling variance of the pairwise correlation estimates that are the direct inputs to the elimination rule; noisier estimates can therefore produce different elimination sequences even under an identical rule, so differences in total shots and success ratio may partly reflect altered recursion trajectories rather than measurement efficiency alone.
Authors: We thank the referee for this observation. It is correct that lower per-step shot counts increase the variance of the correlation estimates that feed the elimination rule, and therefore can alter the recursion trajectory even when the rule itself is unchanged. Our fixed-cap protocol equalizes the per-step budget across strategies while preserving the elimination rule, but does not constrain the resulting path. We will revise the manuscript to explicitly acknowledge this limitation of the isolation argument and to clarify that the reported reductions reflect the net effect of adaptive allocation on both measurement cost and the induced elimination sequences. Should the referee consider it necessary, we can add a controlled experiment that fixes the elimination order to isolate the pure measurement-efficiency contribution. revision: partial
Circularity Check
No circularity: empirical performance metrics from RL training and testing
full rationale
The paper's central claims are measured shot reductions (23% heuristic, 36% RL) on held-out graph instances under a fixed experimental protocol. These are direct outputs of running the described allocation policies against uniform baselines; they do not reduce by any equation in the manuscript to quantities defined in terms of themselves or to parameters fitted on the same test data. The fixed-cap fairness protocol and unchanged elimination rule are explicit experimental controls, not a derivation step. No self-citation chain, ansatz smuggling, or self-definitional loop appears in the provided text. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard quantum mechanics and NISQ noise models apply to the device behavior during measurement.
Reference graph
Works this paper leans on
-
[1]
Quantum Sci
Moll, N., Barkoutsos, P., Bishop, L.S., Chow, J.M., Cross, A., Egger, D.J., Filipp, S., Fuhrer, A., Gambetta, J.M., Ganzhorn, M., Mezzacapo, A., Salis, G., Smolin, J., Tavernelli, I., Temme, K.: Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci. Technol.3, 030503 (2018)
2018
-
[2]
Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S.C., Endo, S., Fujii, K., McClean, J.R., Mitarai, K., Yuan, X., Cincio, L., Coles, P.J.: Variational quantum algorithms. Nat. Rev. Phys.3, 625–644 (2021)
2021
-
[3]
Lee, E., Lee, S., Kim, S.: Quantum walk based Monte Carlo simulation for photon interaction cross sections. Phys. Rev. D111, 116001 (2025) https://doi.org/10. 1103/PhysRevD.111.116001 21
2025
-
[4]
Bauer, B., Bravyi, S., Motta, M., Chan, G.K.-L.: Quantum algorithms for quan- tum chemistry and quantum materials science. Chem. Rev.120, 12685–12717 (2020)
2020
-
[5]
Nature549, 195–202 (2017)
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature549, 195–202 (2017)
2017
-
[6]
Cambridge University Press, Cambridge (2010)
Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2010)
2010
-
[7]
Quantum2, 79 (2018)
Preskill, J.: Quantum computing in the NISQ era and beyond. Quantum2, 79 (2018)
2018
-
[8]
Peruzzo, A., McClean, J., Shadbolt, P., Yung, M.-H., Zhou, X.-Q., Love, P.J., Aspuru-Guzik, A., O’Brien, J.L.: A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.5, 4213 (2014)
2014
-
[9]
McClean, J.R., Romero, J., Babbush, R., Aspuru-Guzik, A.: The theory of variational hybrid quantum-classical algorithms. New J. Phys.18, 023023 (2016)
2016
-
[10]
Nature549, 242–246 (2017)
Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature549, 242–246 (2017)
2017
-
[11]
Bharti, K., Cervera-Lierta, A., Kyaw, T.H., Haug, T., Alperin-Lea, S., Anand, A., Degroote, M., Heimonen, H., Kottmann, J.S., Menke, T., Mok, W.-K., Sim, S., Kwek, L.-C., Aspuru-Guzik, A.: Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys.94, 015004 (2022)
2022
-
[12]
Tilly, J., Chen, H., Cao, S., Picozzi, D., Setia, K., Li, Y., Grant, E., Wossnig, L., Rungger, I., Booth, G.H., Tennyson, J.: The variational quantum eigensolver: a review of methods and best practices. Phys. Rep.986, 1–128 (2022)
2022
-
[13]
Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm (2014)
2014
-
[14]
Algorithms12, 34 (2019)
Hadfield, S., Wang, Z., O’Gorman, B., Rieffel, E.G., Venturelli, D., Biswas, R.: From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms12, 34 (2019)
2019
-
[15]
Wang, Z., Hadfield, S., Jiang, Z., Rieffel, E.G.: Quantum approximate opti- mization algorithm for MaxCut: a fermionic view. Phys. Rev. A97, 022304 (2018)
2018
-
[16]
Zhou, L., Wang, S.-T., Choi, S., Pichler, H., Lukin, M.D.: Quantum approximate optimization algorithm: performance, mechanism, and implementation on near- term devices. Phys. Rev. X10, 021067 (2020) 22
2020
-
[17]
Crooks, G.E.: Performance of the quantum approximate optimization algorithm on the maximum cut problem (2018)
2018
-
[18]
Quantum Inf
Willsch, D., Willsch, M., Jin, F., De Raedt, H., Michielsen, K.: Benchmarking the quantum approximate optimization algorithm. Quantum Inf. Process.19, 197 (2020)
2020
-
[19]
Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: a typical case (2020)
2020
-
[20]
Farhi, E., Gamarnik, D., Gutmann, S.: The quantum approximate optimization algorithm needs to see the whole graph: worst case examples (2020)
2020
-
[21]
Quantum5, 437 (2021)
Marwaha, K.: Local classical MAX-CUT algorithm outperformsp= 2 QAOA on high-girth regular graphs. Quantum5, 437 (2021)
2021
-
[22]
Hastings, M.B.: Classical and quantum bounded depth approximation algorithms (2019)
2019
-
[23]
Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Obstacles to state preparation and variational optimization from symmetry protection. Phys. Rev. Lett.125, 260505 (2020)
2020
-
[24]
McClean, J.R., Boixo, S., Smelyanskiy, V.N., Babbush, R., Neven, H.: Barren plateaus in quantum neural network training landscapes. Nat. Commun.9, 4812 (2018)
2018
-
[25]
Cerezo, M., Sone, A., Volkoff, T., Cincio, L., Coles, P.J.: Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 1791 (2021)
2021
-
[26]
Bittel, L., Kliesch, M.: Training variational quantum algorithms is NP-hard. Phys. Rev. Lett.127, 120502 (2021)
2021
-
[27]
Quantum6, 678 (2022)
Bravyi, S., Kliesch, A., Koenig, R., Tang, E.: Hybrid quantum-classical algorithms for approximate graph coloring. Quantum6, 678 (2022)
2022
-
[28]
Bae, E., Lee, S.: Recursive QAOA outperforms the original QAOA for the MAX- CUT problem on complete graphs (2023)
2023
-
[29]
EPJ Quantum Technol.11, 1 (2024)
Patel, Y.J., Jerbi, S., B¨ ack, T., Dunjko, V.: Reinforcement learning assisted recursive QAOA. EPJ Quantum Technol.11, 1 (2024)
2024
-
[30]
PRX Quantum5, 020327 (2024) 23
Finˇ zgar, J.R., Kerschbaumer, A., Schuetz, M.J.A., Mendl, C.B., Katzgraber, H.G.: Quantum-informed recursive optimization algorithms. PRX Quantum5, 020327 (2024) 23
2024
-
[31]
Brady, L.T., Hadfield, S.: Iterative quantum algorithms for maximum indepen- dent set. Phys. Rev. A110, 052435 (2024)
2024
-
[32]
Temme, K., Bravyi, S., Gambetta, J.M.: Error mitigation for short-depth quantum circuits. Phys. Rev. Lett.119, 180509 (2017)
2017
-
[33]
Endo, S., Cai, Z., Benjamin, S.C., Yuan, X.: Hybrid quantum-classical algorithms and quantum error mitigation. J. Phys. Soc. Jpn.90, 032001 (2021)
2021
-
[34]
Quantum4, 263 (2020)
K¨ ubler, J.M., Arrasmith, A., Cincio, L., Coles, P.J.: An adaptive optimizer for measurement-frugal variational algorithms. Quantum4, 263 (2020)
2020
-
[35]
Arrasmith, A., Cincio, L., Somma, R.D., Coles, P.J.: Operator sampling for shot- frugal optimization in variational algorithms (2020)
2020
-
[36]
Gu, A., Lowe, A., Dub, P.A., Coles, P.J., Arrasmith, A.: Adaptive shot allocation for fast convergence in variational quantum algorithms (2021)
2021
-
[37]
Ito, K., Mizukami, W., Fujii, K.: Latency-aware adaptive shot allocation for variational quantum algorithms (2023)
2023
-
[38]
Quantum4, 314 (2020)
Sweke, R., Wilde, F., Meyer, J., Schuld, M., Faehrmann, P.K., Meynard- Piganeau, B., Eisert, J.: Stochastic gradient descent for hybrid quantum-classical optimization. Quantum4, 314 (2020)
2020
-
[39]
Quantum 4, 269 (2020)
Stokes, J., Izaac, J., Killoran, N., Carleo, G.: Quantum natural gradient. Quantum 4, 269 (2020)
2020
-
[40]
IEEE Trans
Spall, J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst.34, 817–823 (1998)
1998
-
[41]
Zhao, A., Rubin, N.C., Miyake, A.: Measurement reduction in variational quantum algorithms. Phys. Rev. Lett.127, 110504 (2021)
2021
-
[42]
npj Quantum Inf.7, 23 (2021)
Huggins, W.J., McClean, J.R., Rubin, N., Jiang, Z., Wiebe, N., Whaley, K.B., Babbush, R.: Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers. npj Quantum Inf.7, 23 (2021)
2021
-
[43]
npj Quantum Inf.6, 34 (2020)
Gokhale, P., Angiuli, O., Ding, Y., Gui, K., Tomesh, T., Suchara, M., Martonosi, M., Chong, F.T.: Minimizing state preparations in variational quantum eigen- solver by partitioning into commuting families. npj Quantum Inf.6, 34 (2020)
2020
-
[44]
Sci.10, 3746–3755 (2019)
Izmaylov, A.F., Yen, T.-C., Ryabinkin, I.G.: Revising the measurement process in the variational quantum eigensolver: is it possible to reduce the number of separately measured operators? Chem. Sci.10, 3746–3755 (2019)
2019
-
[45]
Verteletskyi, V., Yen, T.-C., Izmaylov, A.F.: Measurement optimization in the variational quantum eigensolver using a minimum clique cover. J. Chem. Phys. 24 152, 124114 (2020)
2020
-
[46]
Quantum5, 385 (2021)
Crawford, O., Straaten, B., Wang, D., Parks, T., Campbell, E., Brierley, S.: Effi- cient quantum measurement of Pauli operators in the presence of finite sampling error. Quantum5, 385 (2021)
2021
-
[47]
Liang, S., Zhu, L., Liu, X., Yang, C., Li, X.: Artificial-intelligence-driven shot reduction in quantum measurement. Chem. Phys. Rev.5, 041403 (2024) https: //doi.org/10.1063/5.0219663
-
[48]
Wauters, M.M., El-Araby, E., Amin, M.H.: Reinforcement-learning-assisted quan- tum optimization. Phys. Rev. Res.2, 033446 (2020)
2020
-
[49]
In: Proceedings of Machine Learning Research, vol
Yao, J., Bukov, M., Lin, L.: Policy gradient based quantum approximate opti- mization algorithm. In: Proceedings of Machine Learning Research, vol. 107, pp. 605–634 (2020)
2020
-
[50]
Khairy, S., Shaydulin, R., Cincio, L., Alexeev, Y., Balaprakash, P.: Reinforcement- learning-based variational quantum circuits optimization for combinatorial prob- lems (2020)
2020
-
[51]
In: Advances in Neural Information Processing Systems 34, pp
Ostaszewski, M., Trenkwalder, L.M., Masarczyk, W., Scerri, E., Dunjko, V.: Rein- forcement learning for optimization of variational quantum circuit architectures. In: Advances in Neural Information Processing Systems 34, pp. 18182–18194. Curran Associates, Red Hook, NY (2021)
2021
-
[52]
EPJ Quantum Technol
Skolik, A., Mangini, S., B¨ ack, T., Macchiavello, C., Dunjko, V.: Robustness of quantum reinforcement learning under hardware errors. EPJ Quantum Technol. 10, 12 (2023)
2023
-
[53]
Altmann, P., Stein, J., K¨ olle, M., B˘ arligea, A., Gabor, T., Phan, T., Feld, S., Linnhoff-Popien, C.: Challenges for reinforcement learning in quantum circuit design (2024)
2024
-
[54]
Lucas, A.: Ising formulations of many NP problems. Front. Phys.2, 5 (2014)
2014
-
[55]
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for max- imum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)
1995
-
[56]
Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b
Ozaeta, A., Dam, W., McMahon, P.L.: Expectation values from the single-layer quantum approximate optimization algorithm on Ising problems. Quantum Sci- ence and Technology7(4), 045036 (2022) https://doi.org/10.1088/2058-9565/ ac9f2b
-
[57]
Kuo, E.-J., Fang, Y.-L.L., Chen, S.Y.-C.: Quantum architecture search via deep reinforcement learning (2021) 25
2021
-
[58]
F¨ osel, T., Niu, M.Y., Marquardt, F., Li, L.: Quantum circuit optimization with deep reinforcement learning (2021)
2021
-
[59]
F¨ osel, T., Tighineanu, P., Weiss, T., Marquardt, F.: Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018)
2018
-
[60]
npj Quantum Inf.5, 33 (2019)
Niu, M.Y., Boixo, S., Smelyanskiy, V.N., Neven, H.: Universal quantum control through deep reinforcement learning. npj Quantum Inf.5, 33 (2019)
2019
-
[61]
Bukov, M., Day, A.G.R., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018)
2018
-
[62]
MIT Press, Cambridge, MA (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, MA (2018)
2018
-
[63]
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn.8, 279–292 (1992)
1992
-
[64]
In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A
Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Red Hook, NY (2010)
2010
-
[65]
Chapman & Hall/CRC, Boca Raton, FL (1999)
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL (1999)
1999
-
[66]
In: Proceedings of the 34th International Conference on Machine Learning
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning. Pro- ceedings of Machine Learning Research, pp. 22–31. PMLR, Sydney, Australia (2017)
2017
-
[67]
In: International Conference on Learning Representations (2019)
Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimiza- tion. In: International Conference on Learning Representations (2019)
2019
-
[68]
Aleksandrowicz, G., et al.: Qiskit: an open-source framework for quantum computing. Zenodo (2019). https://doi.org/10.5281/zenodo.2562111
-
[69]
Nature567, 491–495 (2019) 26
Kandala, A., Temme, K., C´ orcoles, A.D., Mezzacapo, A., Chow, J.M., Gam- betta, J.M.: Error mitigation extends the computational reach of a noisy quantum processor. Nature567, 491–495 (2019) 26
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.