pith. sign in

arxiv: 2509.05890 · v3 · submitted 2025-09-07 · 🪐 quant-ph · cs.AI· cs.LG· math-ph· math.MP

Quantum spatial best-arm identification via quantum walks

Pith reviewed 2026-05-18 18:57 UTC · model grok-4.3

classification 🪐 quant-ph cs.AIcs.LGmath-phmath.MP
keywords quantum walksbest-arm identificationgraph banditsSzegedy quantum walkamplitude amplificationspatial constraintsquantum reinforcement learning
0
0 comments X

The pith

Quantum walks on graphs identify the best arm in spatially constrained bandit problems by reaching maximal success probabilities at derived time steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates best-arm identification when arms are accessible only according to graph edges. It introduces QSBAI, a framework that encodes superpositions over these graph-constrained actions using quantum walks. This extends amplitude amplification by generalizing earlier quantum best-arm methods through Szegedy's walk construction. The analysis supplies exact expressions for the highest probability of correctly naming the best arm and the time step at which that probability occurs, at least when the underlying graph is complete or bipartite. A reader would care because the work shows how quantum search techniques can be carried over to decision problems whose action sets have spatial or network structure.

Core claim

By adapting Szegedy's quantum-walk operator to the graph bandit setting, superpositions over allowed actions can be prepared so that amplitude amplification yields an explicit maximal success probability for identifying the optimal arm together with the precise time step at which this maximum is attained for complete and bipartite graphs.

What carries the argument

Szegedy's quantum-walk operator, which maps the graph's connectivity into a unitary evolution that supports superposition over neighboring arms and enables the amplitude-amplification step for best-arm selection.

If this is right

  • The same walk-based construction applies directly to any graph, not only complete or bipartite ones.
  • The derived success probability and optimal time step become explicit functions of graph size and connectivity for the studied families.
  • Quantum best-arm identification is thereby placed on the same footing as quantum search on graphs.
  • The approach supplies a concrete route to incorporate spatial constraints into existing quantum reinforcement-learning algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the walk operator remains efficient on sparse graphs, quadratic speedups could appear in networked decision tasks such as routing or sensor selection.
  • The framework may be combined with quantum oracles for reward estimation to handle stochastic graphs.
  • Small-graph implementations could be tested on current quantum hardware to check whether the predicted time steps remain short enough to be useful.

Load-bearing premise

The Szegedy walk operator can be realized on the graph-constrained action space without extra costs that would remove any quadratic advantage over classical methods.

What would settle it

A numerical simulation or small-scale quantum experiment on a complete graph that fails to reach the paper's predicted maximal success probability at the stated time step.

Figures

Figures reproduced from arXiv: 2509.05890 by Andr\'e R\"ohm, Atsushi Uchida, Etsuo Segawa, Ryoichi Horisaki, Takatomo Mihana, Tomoki Yamagami.

Figure 1
Figure 1. Figure 1: FIG. 1. Conceptual illustration of a multi-armed bandit [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Conceptual illustration of the difference between ran [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Example of arm selection on a graph [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Possible state transitions in arm selection and envi [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Example of constructing the executive graph [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Example of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Variation of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Variation of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Quantum reinforcement learning has emerged as a framework combining quantum computation with sequential decision-making, and applications to the multi-armed bandit (MAB) problem have been reported. The graph bandit problem extends the MAB setting by introducing spatial constraints, where the accessibility of arms is restricted by graph connectivity, yet quantum approaches to this setting remain limited. In this paper, we formulate best-arm identification in graph bandits and propose a quantum algorithmic framework, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures. This framework uses quantum walks to encode superpositions over graph-constrained actions, thereby extending amplitude amplification and generalizing the quantum BAI algorithm via Szegedy's walk framework. We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved. Our results clarify how quantum-walk-based search can be adapted to structurally constrained decision problems and provide a foundation for quantum best-arm identification in graph-structured environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript formulates best-arm identification in graph bandits and proposes the Quantum Spatial Best-Arm Identification (QSBAI) framework. This uses Szegedy quantum walks to encode superpositions over graph-constrained actions, extending amplitude amplification to generalize prior quantum BAI results. Explicit derivations of maximal success probability and the achieving time step are provided for complete and bipartite graphs, with the framework claimed applicable to general graphs.

Significance. If the derivations are correct and the no-overhead premise holds, the work would establish a concrete quantum-walk approach to spatially constrained best-arm identification, offering potential quadratic speedups over classical methods in graph bandits. The explicit results for complete and bipartite cases provide a verifiable foundation and clarify adaptation of quantum search to decision problems with connectivity constraints. This strengthens the link between quantum walks and quantum reinforcement learning.

major comments (2)
  1. [Abstract and theoretical analysis sections] The central claim that QSBAI is applicable to general graph structures relies on Szegedy's walk operator encoding graph-constrained actions without overhead that erases the quadratic speedup. However, explicit derivations and analysis are restricted to complete and bipartite graphs (where the coin and shift operators are uniform and trivial). For arbitrary graphs the marking oracle and walk operator construction may require additional queries or depth scaling with degree or diameter; this load-bearing premise for the generalization is not verified.
  2. [Theoretical analysis (complete and bipartite cases)] The derivations of maximal success probability and optimal time step are presented for the two special graphs, but the manuscript does not supply error bounds, proof sketches, or explicit equations showing how the Szegedy operator is adapted when the best arm is unknown. Without these, it is impossible to confirm that post-hoc assumptions do not affect the claimed probabilities.
minor comments (2)
  1. [Introduction and preliminaries] Notation for the graph bandit setting and the definition of the marking oracle could be introduced with a small illustrative example or diagram to improve readability for readers unfamiliar with quantum walks on graphs.
  2. [Results section] A brief comparison table of classical vs. quantum success probabilities and time steps for the complete and bipartite cases would help highlight the claimed advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address each major comment in turn below, indicating where we agree revisions are warranted and providing our reasoning on points of clarification.

read point-by-point responses
  1. Referee: [Abstract and theoretical analysis sections] The central claim that QSBAI is applicable to general graph structures relies on Szegedy's walk operator encoding graph-constrained actions without overhead that erases the quadratic speedup. However, explicit derivations and analysis are restricted to complete and bipartite graphs (where the coin and shift operators are uniform and trivial). For arbitrary graphs the marking oracle and walk operator construction may require additional queries or depth scaling with degree or diameter; this load-bearing premise for the generalization is not verified.

    Authors: We appreciate the referee highlighting the distinction between the general framework and the explicit cases analyzed. The QSBAI construction defines the Szegedy walk operator directly from the graph's adjacency structure via the standard coin and shift operators, which encodes the spatial constraints without introducing query overhead beyond the inherent cost of implementing the walk on that graph. Any scaling with degree or diameter is a property of the underlying graph representation and does not erase the quadratic speedup relative to classical sampling on the same graph. While closed-form derivations of maximal success probability and optimal time steps are provided only for complete and bipartite graphs (to obtain tractable expressions), the framework itself applies to general graphs by the same operator construction. We will revise the manuscript to include an expanded discussion clarifying this point and noting the graph-dependent aspects of implementation. revision: partial

  2. Referee: [Theoretical analysis (complete and bipartite cases)] The derivations of maximal success probability and optimal time step are presented for the two special graphs, but the manuscript does not supply error bounds, proof sketches, or explicit equations showing how the Szegedy operator is adapted when the best arm is unknown. Without these, it is impossible to confirm that post-hoc assumptions do not affect the claimed probabilities.

    Authors: We agree that additional technical details will strengthen the presentation. The derivations apply the Szegedy walk to the graph with a marking oracle that applies a phase flip to the state corresponding to the best arm, identified via reward queries. The operator is adapted by composing the standard walk with this marking step in the usual amplitude amplification manner. We will add explicit equations for this adaptation, along with proof sketches and error bounds on the success probability, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates QSBAI by adapting Szegedy's quantum-walk operator to graph-constrained actions and derives maximal success probability plus optimal time step explicitly for complete and bipartite graphs. These derivations rest on standard amplitude amplification and walk operators that are well-defined for the chosen topologies without reduction to fitted parameters or self-referential definitions. No equations in the provided text equate a claimed prediction to its own input by construction, and the generalization to arbitrary graphs is stated as an extension rather than a load-bearing self-citation chain. The framework is therefore self-contained against external quantum-walk literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that Szegedy's walk can be lifted to the graph-constrained action space while preserving quadratic speedup; no free parameters, new particles, or ad-hoc axioms are named in the abstract.

axioms (1)
  • domain assumption Szegedy's quantum-walk operator extends directly to graph-constrained action spaces without destroying the amplitude-amplification advantage.
    Invoked when the paper states that QSBAI 'extends amplitude amplification and generalizes the quantum BAI algorithm via Szegedy's walk framework.'

pith-pipeline@v0.9.0 · 5735 in / 1266 out tokens · 31824 ms · 2026-05-18T18:57:00.071970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    standard

    also presented a quantum-amplitude-amplification- based BAI algorithm with a different approach, showing that the best arm can be found with a fixed confidence. Moreover, Cho et al. [34] proposed a decision-making al- gorithm in the framework of adversarial MAB problems [35], which is also exploration-based with quantum am- plitude amplification. This pap...

  2. [2]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction(MIT press, 2018)

  3. [3]

    Hwangbo, J

    J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, Learning agile and dynamic motor skills for legged robots, Science Robotics 4, eaau5872 (2019)

  4. [4]

    Kim and H

    T. Kim and H. Y. Kim, Optimizing the pairs-trading strategy using deep reinforcement learning with trad- ing and stop-loss boundaries, Complexity2019, 3582516 (2019)

  5. [5]

    J. Zhao, T. Qu, and F. Xu, A deep reinforcement learn- ing approach for autonomous highway driving, IFAC- PapersOnLine53, 542 (2020)

  6. [6]

    Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

    H. Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

  7. [7]

    Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

    A. Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

  8. [8]

    Bubeck, R

    S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, inProceedings of the 20th International Conference on Algorithmic Learning Theory (ALT 2009)(Springer, 2009) pp. 23–37

  9. [9]

    Audibert, S

    J.-Y. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, inProceedings of the 23rd Conference on Learning Theory (COLT 2010) (2010) pp. 41–53

  10. [10]

    J. G. March, Exploration and exploitation in organiza- tional learning, Organization Science2, 71 (1991)

  11. [11]

    Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

    K. Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

  12. [12]

    Zhang, K

    T. Zhang, K. Johansson, and N. Li, Multi-armed bandit learning on a graph, inProceedings of 57th Annual IEEE Conference on Information Sciences and Systems (CISS 2023)(2023) pp. 1–6

  13. [13]

    F. Li, D. Yu, H. Yang, J. Yu, H. Karl, and X. Cheng, Multi-armed-bandit-based spectrum scheduling algo- rithms in wireless networks: A survey, IEEE Wireless Communications27, 24 (2020)

  14. [14]

    Takeuchi, M

    S. Takeuchi, M. Hasegawa, K. Kanno, A. Uchida, N. Chauvet, and M. Naruse, Dynamic channel selection in wireless communications via a multi-armed bandit al- gorithm using laser chaos time series, Scientific reports 10, 1574 (2020)

  15. [15]

    Zhang, Y

    J. Zhang, Y. Huang, Y. Zhou, and X. You, Beam align- ment and tracking for millimeter wave communications via bandit learning, IEEE Transactions on Communica- tions68, 5519 (2020)

  16. [16]

    Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

    P. Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

  17. [17]

    H. J. Briegel and G. De las Cuevas, Projective simulation for artificial intelligence, Scientific Reports2, 400 (2012)

  18. [18]

    Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

    P. Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

  19. [19]

    Aaronson, Read the fine print, Nature Physics11, 291 (2015)

    S. Aaronson, Read the fine print, Nature Physics11, 291 (2015)

  20. [20]

    Dunjko, J

    V. Dunjko, J. M. Taylor, and H. J. Briegel, Quantum- enhanced machine learning, Physical Review Letters 117, 130501 (2016). 15

  21. [21]

    Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

    L. Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

  22. [22]

    Biamonte, P

    J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Na- ture549, 195 (2017)

  23. [23]

    D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum rein- forcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38, 1207 (2008)

  24. [24]

    G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin- Delgado, and H. J. Briegel, Quantum speedup for active learning agents, Physical Review X4, 031002 (2014)

  25. [25]

    J.-A. Li, D. Dong, Z. Wei, Y. Liu, Y. Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during hu- man decision-making, Nature Human Behaviour4, 294 (2020)

  26. [26]

    M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information(Cambridge university press, 2010)

  27. [27]

    C. H. Bennett and D. P. DiVincenzo, Quantum informa- tion and computation, Nature404, 247 (2000)

  28. [28]

    R. P. Feynman, Simulating physics with computers, International Journal of Theoretical Physics21, 467 (1982)

  29. [29]

    Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London

    D. Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences400, 97 (1985)

  30. [30]

    P. W. Shor, Algorithms for quantum computation: dis- crete logarithms and factoring, inProceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science (FOCS 1994)(1994) pp. 124–134

  31. [31]

    L. K. Grover, A fast quantum mechanical algorithm for database search, inProceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC 1996) (1996) pp. 212–219

  32. [32]

    Brassard, P

    G. Brassard, P. Høyer, M. Mosca, and A. Tapp, Quantum amplitude amplification and estimation, Contemporary Mathematics305, 53 (2002)

  33. [33]

    Casal´ e, G

    B. Casal´ e, G. Di Molfetta, H. Kadri, and L. Ralaivola, Quantum bandits, Quantum Machine Intelligence2, 11 (2020)

  34. [34]

    D. Wang, X. You, T. Li, and A. M. Childs, Quantum ex- ploration algorithms for multi-armed bandits, inProceed- ings of the AAAI Conference on Artificial Intelligence, Vol. 35 (2021) pp. 10102–10110

  35. [35]

    B. Cho, Y. Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversar- ial environment, IEEE Transactions on Knowledge and Data Engineering36, 311 (2023)

  36. [36]

    P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SIAM Journal on Computing32, 48 (2002)

  37. [37]

    Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D

    T. Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D. thesis, The University of Tokyo (2023)

  38. [38]

    Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp

    N. Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp. 309–452

  39. [39]

    Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

    R. Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

  40. [40]

    Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

    J. Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

  41. [41]

    S. E. Venegas-Andraca, Quantum walks: a comprehen- sive review, Quantum Information Processing11, 1015 (2012)

  42. [42]

    Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp

    M. Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp. 32–41

  43. [43]

    Yamagami, E

    T. Yamagami, E. Segawa, T. Mihana, A. R¨ ohm, R. Ho- risaki, and M. Naruse, Bandit algorithm driven by a clas- sical random walk and a quantum walk, Entropy25, 843 (2023)

  44. [44]

    Higuchi, N

    Y. Higuchi, N. Konno, I. Sato, and E. Segawa, Periodic- ity of the discrete-time quantum walk on a finite graph, Interdisciplinary Information Sciences23, 75 (2017)

  45. [45]

    D. E. Knuth,The Art of Computer Programming, Vol. 4A (Addison-Wesley Reading, MA, 2011)

  46. [46]

    M. L. Rhodes and T. G. Wong, Quantum walk search on the complete bipartite graph, Physical Review A99, 032301 (2019)