Quantum spatial best-arm identification via quantum walks

Andr\'e R\"ohm; Atsushi Uchida; Etsuo Segawa; Ryoichi Horisaki; Takatomo Mihana; Tomoki Yamagami

arxiv: 2509.05890 · v3 · submitted 2025-09-07 · 🪐 quant-ph · cs.AI· cs.LG· math-ph· math.MP

Quantum spatial best-arm identification via quantum walks

Tomoki Yamagami , Etsuo Segawa , Takatomo Mihana , Andr\'e R\"ohm , Atsushi Uchida , Ryoichi Horisaki This is my paper

Pith reviewed 2026-05-18 18:57 UTC · model grok-4.3

classification 🪐 quant-ph cs.AIcs.LGmath-phmath.MP

keywords quantum walksbest-arm identificationgraph banditsSzegedy quantum walkamplitude amplificationspatial constraintsquantum reinforcement learning

0 comments

The pith

Quantum walks on graphs identify the best arm in spatially constrained bandit problems by reaching maximal success probabilities at derived time steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates best-arm identification when arms are accessible only according to graph edges. It introduces QSBAI, a framework that encodes superpositions over these graph-constrained actions using quantum walks. This extends amplitude amplification by generalizing earlier quantum best-arm methods through Szegedy's walk construction. The analysis supplies exact expressions for the highest probability of correctly naming the best arm and the time step at which that probability occurs, at least when the underlying graph is complete or bipartite. A reader would care because the work shows how quantum search techniques can be carried over to decision problems whose action sets have spatial or network structure.

Core claim

By adapting Szegedy's quantum-walk operator to the graph bandit setting, superpositions over allowed actions can be prepared so that amplitude amplification yields an explicit maximal success probability for identifying the optimal arm together with the precise time step at which this maximum is attained for complete and bipartite graphs.

What carries the argument

Szegedy's quantum-walk operator, which maps the graph's connectivity into a unitary evolution that supports superposition over neighboring arms and enables the amplitude-amplification step for best-arm selection.

If this is right

The same walk-based construction applies directly to any graph, not only complete or bipartite ones.
The derived success probability and optimal time step become explicit functions of graph size and connectivity for the studied families.
Quantum best-arm identification is thereby placed on the same footing as quantum search on graphs.
The approach supplies a concrete route to incorporate spatial constraints into existing quantum reinforcement-learning algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the walk operator remains efficient on sparse graphs, quadratic speedups could appear in networked decision tasks such as routing or sensor selection.
The framework may be combined with quantum oracles for reward estimation to handle stochastic graphs.
Small-graph implementations could be tested on current quantum hardware to check whether the predicted time steps remain short enough to be useful.

Load-bearing premise

The Szegedy walk operator can be realized on the graph-constrained action space without extra costs that would remove any quadratic advantage over classical methods.

What would settle it

A numerical simulation or small-scale quantum experiment on a complete graph that fails to reach the paper's predicted maximal success probability at the stated time step.

Figures

Figures reproduced from arXiv: 2509.05890 by Andr\'e R\"ohm, Atsushi Uchida, Etsuo Segawa, Ryoichi Horisaki, Takatomo Mihana, Tomoki Yamagami.

**Figure 2.** Figure 2: FIG. 2. Conceptual illustration of the difference between ran [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Example of arm selection on a graph [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Possible state transitions in arm selection and envi [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Example of constructing the executive graph [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Example of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Variation of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Variation of the recommendation probability [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Quantum reinforcement learning has emerged as a framework combining quantum computation with sequential decision-making, and applications to the multi-armed bandit (MAB) problem have been reported. The graph bandit problem extends the MAB setting by introducing spatial constraints, where the accessibility of arms is restricted by graph connectivity, yet quantum approaches to this setting remain limited. In this paper, we formulate best-arm identification in graph bandits and propose a quantum algorithmic framework, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures. This framework uses quantum walks to encode superpositions over graph-constrained actions, thereby extending amplitude amplification and generalizing the quantum BAI algorithm via Szegedy's walk framework. We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved. Our results clarify how quantum-walk-based search can be adapted to structurally constrained decision problems and provide a foundation for quantum best-arm identification in graph-structured environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a quantum-walk framework for best-arm identification on graphs but only derives the success probabilities and optimal steps for complete and bipartite cases.

read the letter

The main takeaway is that this work formulates best-arm identification under graph connectivity constraints and uses Szegedy quantum walks to encode the allowed actions, yet the explicit results stay limited to complete and bipartite graphs. What is new is the combination of quantum walks with spatial bandit problems; earlier quantum BAI papers handled unstructured arms, so extending the setting to graphs where only adjacent arms are accessible counts as a real step. The paper does a reasonable job defining the QSBAI framework and showing how amplitude amplification can be adapted via the walk operator for these constrained choices. For the two graph families they treat, they state the maximal success probability and the time step that achieves it, which gives a concrete handle on how the quantum search behaves under simple connectivity rules. This supplies a starting point for quantum methods in routing or sensor networks where decisions respect spatial limits. The soft spots are the absence of any displayed equations or proof outlines in the abstract and the fact that the detailed analysis covers only graphs whose walk operators are trivial. The key premise that the Szegedy operator applies directly without extra overhead for general graphs is not checked beyond those easy cases, so the stress-test worry about degree or diameter costs remains open. If the full derivations show the quadratic speedup survives without hidden query or depth penalties, the claim holds; otherwise the framework is more suggestive than finished. Readers working on quantum reinforcement learning or quantum search on networks will find the most use here. The paper engages the literature honestly enough to deserve a serious referee who can inspect the math and the general-graph construction. I would send it to peer review once the authors supply the missing derivations and address the overhead question for non-trivial graphs.

Referee Report

2 major / 2 minor

Summary. The manuscript formulates best-arm identification in graph bandits and proposes the Quantum Spatial Best-Arm Identification (QSBAI) framework. This uses Szegedy quantum walks to encode superpositions over graph-constrained actions, extending amplitude amplification to generalize prior quantum BAI results. Explicit derivations of maximal success probability and the achieving time step are provided for complete and bipartite graphs, with the framework claimed applicable to general graphs.

Significance. If the derivations are correct and the no-overhead premise holds, the work would establish a concrete quantum-walk approach to spatially constrained best-arm identification, offering potential quadratic speedups over classical methods in graph bandits. The explicit results for complete and bipartite cases provide a verifiable foundation and clarify adaptation of quantum search to decision problems with connectivity constraints. This strengthens the link between quantum walks and quantum reinforcement learning.

major comments (2)

[Abstract and theoretical analysis sections] The central claim that QSBAI is applicable to general graph structures relies on Szegedy's walk operator encoding graph-constrained actions without overhead that erases the quadratic speedup. However, explicit derivations and analysis are restricted to complete and bipartite graphs (where the coin and shift operators are uniform and trivial). For arbitrary graphs the marking oracle and walk operator construction may require additional queries or depth scaling with degree or diameter; this load-bearing premise for the generalization is not verified.
[Theoretical analysis (complete and bipartite cases)] The derivations of maximal success probability and optimal time step are presented for the two special graphs, but the manuscript does not supply error bounds, proof sketches, or explicit equations showing how the Szegedy operator is adapted when the best arm is unknown. Without these, it is impossible to confirm that post-hoc assumptions do not affect the claimed probabilities.

minor comments (2)

[Introduction and preliminaries] Notation for the graph bandit setting and the definition of the marking oracle could be introduced with a small illustrative example or diagram to improve readability for readers unfamiliar with quantum walks on graphs.
[Results section] A brief comparison table of classical vs. quantum success probabilities and time steps for the complete and bipartite cases would help highlight the claimed advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address each major comment in turn below, indicating where we agree revisions are warranted and providing our reasoning on points of clarification.

read point-by-point responses

Referee: [Abstract and theoretical analysis sections] The central claim that QSBAI is applicable to general graph structures relies on Szegedy's walk operator encoding graph-constrained actions without overhead that erases the quadratic speedup. However, explicit derivations and analysis are restricted to complete and bipartite graphs (where the coin and shift operators are uniform and trivial). For arbitrary graphs the marking oracle and walk operator construction may require additional queries or depth scaling with degree or diameter; this load-bearing premise for the generalization is not verified.

Authors: We appreciate the referee highlighting the distinction between the general framework and the explicit cases analyzed. The QSBAI construction defines the Szegedy walk operator directly from the graph's adjacency structure via the standard coin and shift operators, which encodes the spatial constraints without introducing query overhead beyond the inherent cost of implementing the walk on that graph. Any scaling with degree or diameter is a property of the underlying graph representation and does not erase the quadratic speedup relative to classical sampling on the same graph. While closed-form derivations of maximal success probability and optimal time steps are provided only for complete and bipartite graphs (to obtain tractable expressions), the framework itself applies to general graphs by the same operator construction. We will revise the manuscript to include an expanded discussion clarifying this point and noting the graph-dependent aspects of implementation. revision: partial
Referee: [Theoretical analysis (complete and bipartite cases)] The derivations of maximal success probability and optimal time step are presented for the two special graphs, but the manuscript does not supply error bounds, proof sketches, or explicit equations showing how the Szegedy operator is adapted when the best arm is unknown. Without these, it is impossible to confirm that post-hoc assumptions do not affect the claimed probabilities.

Authors: We agree that additional technical details will strengthen the presentation. The derivations apply the Szegedy walk to the graph with a marking oracle that applies a phase flip to the state corresponding to the best arm, identified via reward queries. The operator is adapted by composing the standard walk with this marking step in the usual amplitude amplification manner. We will add explicit equations for this adaptation, along with proof sketches and error bounds on the success probability, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates QSBAI by adapting Szegedy's quantum-walk operator to graph-constrained actions and derives maximal success probability plus optimal time step explicitly for complete and bipartite graphs. These derivations rest on standard amplitude amplification and walk operators that are well-defined for the chosen topologies without reduction to fitted parameters or self-referential definitions. No equations in the provided text equate a claimed prediction to its own input by construction, and the generalization to arbitrary graphs is stated as an extension rather than a load-bearing self-citation chain. The framework is therefore self-contained against external quantum-walk literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that Szegedy's walk can be lifted to the graph-constrained action space while preserving quadratic speedup; no free parameters, new particles, or ad-hoc axioms are named in the abstract.

axioms (1)

domain assumption Szegedy's quantum-walk operator extends directly to graph-constrained action spaces without destroying the amplitude-amplification advantage.
Invoked when the paper states that QSBAI 'extends amplitude amplification and generalizes the quantum BAI algorithm via Szegedy's walk framework.'

pith-pipeline@v0.9.0 · 5735 in / 1266 out tokens · 31824 ms · 2026-05-18T18:57:00.071970+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This framework uses quantum walks to encode superpositions over graph-constrained actions, thereby extending amplitude amplification and generalizing the quantum BAI algorithm via Szegedy's walk framework.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

standard

also presented a quantum-amplitude-amplification- based BAI algorithm with a different approach, showing that the best arm can be found with a fixed confidence. Moreover, Cho et al. [34] proposed a decision-making al- gorithm in the framework of adversarial MAB problems [35], which is also exploration-based with quantum am- plitude amplification. This pap...

work page 2023
[2]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction(MIT press, 2018)

work page 2018
[3]

Hwangbo, J

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, Learning agile and dynamic motor skills for legged robots, Science Robotics 4, eaau5872 (2019)

work page 2019
[4]

Kim and H

T. Kim and H. Y. Kim, Optimizing the pairs-trading strategy using deep reinforcement learning with trad- ing and stop-loss boundaries, Complexity2019, 3582516 (2019)

work page 2019
[5]

J. Zhao, T. Qu, and F. Xu, A deep reinforcement learn- ing approach for autonomous highway driving, IFAC- PapersOnLine53, 542 (2020)

work page 2020
[6]

Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

H. Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

work page 1952
[7]

Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

A. Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

work page 2019
[8]

Bubeck, R

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, inProceedings of the 20th International Conference on Algorithmic Learning Theory (ALT 2009)(Springer, 2009) pp. 23–37

work page 2009
[9]

Audibert, S

J.-Y. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, inProceedings of the 23rd Conference on Learning Theory (COLT 2010) (2010) pp. 41–53

work page 2010
[10]

J. G. March, Exploration and exploitation in organiza- tional learning, Organization Science2, 71 (1991)

work page 1991
[11]

Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

K. Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

work page 2022
[12]

Zhang, K

T. Zhang, K. Johansson, and N. Li, Multi-armed bandit learning on a graph, inProceedings of 57th Annual IEEE Conference on Information Sciences and Systems (CISS 2023)(2023) pp. 1–6

work page 2023
[13]

F. Li, D. Yu, H. Yang, J. Yu, H. Karl, and X. Cheng, Multi-armed-bandit-based spectrum scheduling algo- rithms in wireless networks: A survey, IEEE Wireless Communications27, 24 (2020)

work page 2020
[14]

Takeuchi, M

S. Takeuchi, M. Hasegawa, K. Kanno, A. Uchida, N. Chauvet, and M. Naruse, Dynamic channel selection in wireless communications via a multi-armed bandit al- gorithm using laser chaos time series, Scientific reports 10, 1574 (2020)

work page 2020
[15]

Zhang, Y

J. Zhang, Y. Huang, Y. Zhou, and X. You, Beam align- ment and tracking for millimeter wave communications via bandit learning, IEEE Transactions on Communica- tions68, 5519 (2020)

work page 2020
[16]

Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

P. Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

work page 2002
[17]

H. J. Briegel and G. De las Cuevas, Projective simulation for artificial intelligence, Scientific Reports2, 400 (2012)

work page 2012
[18]

Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

P. Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

work page 2014
[19]

Aaronson, Read the fine print, Nature Physics11, 291 (2015)

S. Aaronson, Read the fine print, Nature Physics11, 291 (2015)

work page 2015
[20]

Dunjko, J

V. Dunjko, J. M. Taylor, and H. J. Briegel, Quantum- enhanced machine learning, Physical Review Letters 117, 130501 (2016). 15

work page 2016
[21]

Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

L. Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

work page 2017
[22]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Na- ture549, 195 (2017)

work page 2017
[23]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum rein- forcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38, 1207 (2008)

work page 2008
[24]

G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin- Delgado, and H. J. Briegel, Quantum speedup for active learning agents, Physical Review X4, 031002 (2014)

work page 2014
[25]

J.-A. Li, D. Dong, Z. Wei, Y. Liu, Y. Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during hu- man decision-making, Nature Human Behaviour4, 294 (2020)

work page 2020
[26]

M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information(Cambridge university press, 2010)

work page 2010
[27]

C. H. Bennett and D. P. DiVincenzo, Quantum informa- tion and computation, Nature404, 247 (2000)

work page 2000
[28]

R. P. Feynman, Simulating physics with computers, International Journal of Theoretical Physics21, 467 (1982)

work page 1982
[29]

Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London

D. Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences400, 97 (1985)

work page 1985
[30]

P. W. Shor, Algorithms for quantum computation: dis- crete logarithms and factoring, inProceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science (FOCS 1994)(1994) pp. 124–134

work page 1994
[31]

L. K. Grover, A fast quantum mechanical algorithm for database search, inProceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC 1996) (1996) pp. 212–219

work page 1996
[32]

Brassard, P

G. Brassard, P. Høyer, M. Mosca, and A. Tapp, Quantum amplitude amplification and estimation, Contemporary Mathematics305, 53 (2002)

work page 2002
[33]

Casal´ e, G

B. Casal´ e, G. Di Molfetta, H. Kadri, and L. Ralaivola, Quantum bandits, Quantum Machine Intelligence2, 11 (2020)

work page 2020
[34]

D. Wang, X. You, T. Li, and A. M. Childs, Quantum ex- ploration algorithms for multi-armed bandits, inProceed- ings of the AAAI Conference on Artificial Intelligence, Vol. 35 (2021) pp. 10102–10110

work page 2021
[35]

B. Cho, Y. Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversar- ial environment, IEEE Transactions on Knowledge and Data Engineering36, 311 (2023)

work page 2023
[36]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SIAM Journal on Computing32, 48 (2002)

work page 2002
[37]

Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D

T. Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D. thesis, The University of Tokyo (2023)

work page 2023
[38]

Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp

N. Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp. 309–452

work page 2008
[39]

Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

R. Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

work page 2013
[40]

Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

J. Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

work page 2003
[41]

S. E. Venegas-Andraca, Quantum walks: a comprehen- sive review, Quantum Information Processing11, 1015 (2012)

work page 2012
[42]

Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp

M. Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp. 32–41

work page 2004
[43]

Yamagami, E

T. Yamagami, E. Segawa, T. Mihana, A. R¨ ohm, R. Ho- risaki, and M. Naruse, Bandit algorithm driven by a clas- sical random walk and a quantum walk, Entropy25, 843 (2023)

work page 2023
[44]

Higuchi, N

Y. Higuchi, N. Konno, I. Sato, and E. Segawa, Periodic- ity of the discrete-time quantum walk on a finite graph, Interdisciplinary Information Sciences23, 75 (2017)

work page 2017
[45]

D. E. Knuth,The Art of Computer Programming, Vol. 4A (Addison-Wesley Reading, MA, 2011)

work page 2011
[46]

M. L. Rhodes and T. G. Wong, Quantum walk search on the complete bipartite graph, Physical Review A99, 032301 (2019)

work page 2019

[1] [1]

standard

also presented a quantum-amplitude-amplification- based BAI algorithm with a different approach, showing that the best arm can be found with a fixed confidence. Moreover, Cho et al. [34] proposed a decision-making al- gorithm in the framework of adversarial MAB problems [35], which is also exploration-based with quantum am- plitude amplification. This pap...

work page 2023

[2] [2]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction(MIT press, 2018)

work page 2018

[3] [3]

Hwangbo, J

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, Learning agile and dynamic motor skills for legged robots, Science Robotics 4, eaau5872 (2019)

work page 2019

[4] [4]

Kim and H

T. Kim and H. Y. Kim, Optimizing the pairs-trading strategy using deep reinforcement learning with trad- ing and stop-loss boundaries, Complexity2019, 3582516 (2019)

work page 2019

[5] [5]

J. Zhao, T. Qu, and F. Xu, A deep reinforcement learn- ing approach for autonomous highway driving, IFAC- PapersOnLine53, 542 (2020)

work page 2020

[6] [6]

Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

H. Robbins, Some aspects of the sequential design of ex- periments, Bulletin of the American Mathematical Soci- ety58, 527 (1952)

work page 1952

[7] [7]

Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

A. Slivkins, Introduction to multi-armed bandits, Foun- dations and Trends®in Machine Learning12, 1 (2019)

work page 2019

[8] [8]

Bubeck, R

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, inProceedings of the 20th International Conference on Algorithmic Learning Theory (ALT 2009)(Springer, 2009) pp. 23–37

work page 2009

[9] [9]

Audibert, S

J.-Y. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, inProceedings of the 23rd Conference on Learning Theory (COLT 2010) (2010) pp. 41–53

work page 2010

[10] [10]

J. G. March, Exploration and exploitation in organiza- tional learning, Organization Science2, 71 (1991)

work page 1991

[11] [11]

Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

K. Johansson, Graph bandits: Multi-armed bandits with locality constraints (2022)

work page 2022

[12] [12]

Zhang, K

T. Zhang, K. Johansson, and N. Li, Multi-armed bandit learning on a graph, inProceedings of 57th Annual IEEE Conference on Information Sciences and Systems (CISS 2023)(2023) pp. 1–6

work page 2023

[13] [13]

F. Li, D. Yu, H. Yang, J. Yu, H. Karl, and X. Cheng, Multi-armed-bandit-based spectrum scheduling algo- rithms in wireless networks: A survey, IEEE Wireless Communications27, 24 (2020)

work page 2020

[14] [14]

Takeuchi, M

S. Takeuchi, M. Hasegawa, K. Kanno, A. Uchida, N. Chauvet, and M. Naruse, Dynamic channel selection in wireless communications via a multi-armed bandit al- gorithm using laser chaos time series, Scientific reports 10, 1574 (2020)

work page 2020

[15] [15]

Zhang, Y

J. Zhang, Y. Huang, Y. Zhou, and X. You, Beam align- ment and tracking for millimeter wave communications via bandit learning, IEEE Transactions on Communica- tions68, 5519 (2020)

work page 2020

[16] [16]

Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

P. Auer, Using confidence bounds for exploitation- exploration trade-offs, Journal of machine learning re- search3, 397 (2002)

work page 2002

[17] [17]

H. J. Briegel and G. De las Cuevas, Projective simulation for artificial intelligence, Scientific Reports2, 400 (2012)

work page 2012

[18] [18]

Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

P. Wittek,Quantum Machine Learning: What Quan- tum Computing Means to Data Mining(Academic Press, 2014)

work page 2014

[19] [19]

Aaronson, Read the fine print, Nature Physics11, 291 (2015)

S. Aaronson, Read the fine print, Nature Physics11, 291 (2015)

work page 2015

[20] [20]

Dunjko, J

V. Dunjko, J. M. Taylor, and H. J. Briegel, Quantum- enhanced machine learning, Physical Review Letters 117, 130501 (2016). 15

work page 2016

[21] [21]

Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

L. Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Scientific Reports 7, 1609 (2017)

work page 2017

[22] [22]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Na- ture549, 195 (2017)

work page 2017

[23] [23]

D. Dong, C. Chen, H. Li, and T.-J. Tarn, Quantum rein- forcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38, 1207 (2008)

work page 2008

[24] [24]

G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin- Delgado, and H. J. Briegel, Quantum speedup for active learning agents, Physical Review X4, 031002 (2014)

work page 2014

[25] [25]

J.-A. Li, D. Dong, Z. Wei, Y. Liu, Y. Pan, F. Nori, and X. Zhang, Quantum reinforcement learning during hu- man decision-making, Nature Human Behaviour4, 294 (2020)

work page 2020

[26] [26]

M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information(Cambridge university press, 2010)

work page 2010

[27] [27]

C. H. Bennett and D. P. DiVincenzo, Quantum informa- tion and computation, Nature404, 247 (2000)

work page 2000

[28] [28]

R. P. Feynman, Simulating physics with computers, International Journal of Theoretical Physics21, 467 (1982)

work page 1982

[29] [29]

Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London

D. Deutsch, Quantum theory, the church–turing principle and the universal quantum computer, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences400, 97 (1985)

work page 1985

[30] [30]

P. W. Shor, Algorithms for quantum computation: dis- crete logarithms and factoring, inProceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science (FOCS 1994)(1994) pp. 124–134

work page 1994

[31] [31]

L. K. Grover, A fast quantum mechanical algorithm for database search, inProceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC 1996) (1996) pp. 212–219

work page 1996

[32] [32]

Brassard, P

G. Brassard, P. Høyer, M. Mosca, and A. Tapp, Quantum amplitude amplification and estimation, Contemporary Mathematics305, 53 (2002)

work page 2002

[33] [33]

Casal´ e, G

B. Casal´ e, G. Di Molfetta, H. Kadri, and L. Ralaivola, Quantum bandits, Quantum Machine Intelligence2, 11 (2020)

work page 2020

[34] [34]

D. Wang, X. You, T. Li, and A. M. Childs, Quantum ex- ploration algorithms for multi-armed bandits, inProceed- ings of the AAAI Conference on Artificial Intelligence, Vol. 35 (2021) pp. 10102–10110

work page 2021

[35] [35]

B. Cho, Y. Xiao, P. Hui, and D. Dong, Quantum bandit with amplitude amplification exploration in an adversar- ial environment, IEEE Transactions on Knowledge and Data Engineering36, 311 (2023)

work page 2023

[36] [36]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SIAM Journal on Computing32, 48 (2002)

work page 2002

[37] [37]

Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D

T. Yamagami,Quantum Walk and Its Application to Decision-Making, Ph.D. thesis, The University of Tokyo (2023)

work page 2023

[38] [38]

Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp

N. Konno, Quantum walks, inQuantum potential theory (Springer, 2008) pp. 309–452

work page 2008

[39] [39]

Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

R. Portugal,Quantum Walks and Search Algorithms (Springer, 2013)

work page 2013

[40] [40]

Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

J. Kempe, Quantum random walks: an introductory overview, Contemporary Physics44, 307 (2003)

work page 2003

[41] [41]

S. E. Venegas-Andraca, Quantum walks: a comprehen- sive review, Quantum Information Processing11, 1015 (2012)

work page 2012

[42] [42]

Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp

M. Szegedy, Quantum speed-up of Markov chain based algorithms, inProceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004)(2004) pp. 32–41

work page 2004

[43] [43]

Yamagami, E

T. Yamagami, E. Segawa, T. Mihana, A. R¨ ohm, R. Ho- risaki, and M. Naruse, Bandit algorithm driven by a clas- sical random walk and a quantum walk, Entropy25, 843 (2023)

work page 2023

[44] [44]

Higuchi, N

Y. Higuchi, N. Konno, I. Sato, and E. Segawa, Periodic- ity of the discrete-time quantum walk on a finite graph, Interdisciplinary Information Sciences23, 75 (2017)

work page 2017

[45] [45]

D. E. Knuth,The Art of Computer Programming, Vol. 4A (Addison-Wesley Reading, MA, 2011)

work page 2011

[46] [46]

M. L. Rhodes and T. G. Wong, Quantum walk search on the complete bipartite graph, Physical Review A99, 032301 (2019)

work page 2019