Analysis of Search Heuristics in the Multi-Armed Bandit Setting

Barbara Hammer; Jasmin Brandt; Jurek Sander; Timo K\"otzing

arxiv: 2604.08109 · v1 · submitted 2026-04-09 · 💻 cs.NE

Analysis of Search Heuristics in the Multi-Armed Bandit Setting

Jasmin Brandt , Barbara Hammer , Timo K\"otzing , Jurek Sander This is my paper

Pith reviewed 2026-05-10 17:56 UTC · model grok-4.3

classification 💻 cs.NE

keywords dueling banditsmulti-armed banditsevolutionary algorithmsestimation of distribution algorithmsCondorcet winnerstationary distributionpairwise comparisonssearch heuristics

0 comments

The pith

The (1+1) EA selects the Condorcet winner only with constant probability when its advantage is Omega(1/n) in dueling bandits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how search heuristics manage the exploration-exploitation tradeoff when options can only be compared pairwise in a stochastic dueling setting. It shows that the (1+1) evolutionary algorithm reaches a stationary distribution in which it selects the Condorcet winner—the arm that beats every other with probability strictly above one half—only with constant probability as long as the winner's edge p satisfies p = Omega(1/n). By contrast, a basic estimation of distribution algorithm maintains a distribution that selects the same winner with probability 1 minus Theta(p). The authors also establish that repeating each comparison a constant number of times raises the (1+1) EA's selection probability for the winner substantially.

Core claim

In the binary stochastic dueling bandit setting with a Condorcet winner that beats every other arm with probability 1-p, the (1+1) EA chooses the Condorcet winner in its stationary distribution only with constant probability if p = Omega(1/n). By contrast, a simple EDA based on the Max-Min Ant System with iteration-best update will choose the Condorcet winner in its maintained distribution with probability 1-Theta(p). Repeated duels significantly boost the probability of the Condorcet winner in the stationary distribution of the (1+1) EA.

What carries the argument

The stationary distribution of the (1+1) EA and the maintained probability distribution of the EDA, both derived from fixed pairwise winning probabilities in the binary stochastic dueling model.

If this is right

When the advantage p is only linear in 1/n the (1+1) EA will not converge to selecting the superior arm with probability approaching 1.
The EDA reliably favors the Condorcet winner even when p is small.
Allowing repeated comparisons inside the (1+1) EA raises its long-run selection probability for the winner.
Search heuristics that rely on single pairwise comparisons can underperform in settings with probabilistic outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Distribution-based methods may be preferable to simple mutation-based EAs when feedback arrives only through noisy pairwise comparisons.
The performance gap could guide algorithm choice in ranking or recommendation tasks that use dueling feedback.
Variants of the (1+1) EA with larger populations might reduce the need for repeated duels.
The same analysis framework could be applied to other heuristics that use tournament-style selection.

Load-bearing premise

A Condorcet winner exists whose winning probability against every other arm is a fixed value 1-p.

What would settle it

Simulate the (1+1) EA for large n on an instance where the best arm wins each duel with probability 1-1/n and measure whether its long-run selection frequency remains bounded below 1 by a positive constant independent of n.

Figures

Figures reproduced from arXiv: 2604.08109 by Barbara Hammer, Jasmin Brandt, Jurek Sander, Timo K\"otzing.

**Figure 1.** Figure 1: Plot of the upper and lower bounds on the probability that arm 𝑖 is the best in expectation for different arms and instantiations of the Plackett-Luce model. 6 Stochastic Queries: Ant Colony Optimization In previous chapters, we studied the distribution of evolutionary algorithms over the set of arms. A class of algorithms which provides such a distribution explicitly are the estimation of distribution a… view at source ↗

**Figure 2.** Figure 2: Lower bound for the probability that arm [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Lower bound for the probability that a specific arm wins in expectation at most [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Lower bound for the probability that a specific arm wins in expectation at least [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=\Omega(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-\Theta(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives new stationary probability bounds showing the (1+1) EA picks the Condorcet winner only constantly often when p=Ω(1/n) while a Max-Min Ant System EDA reaches 1-Θ(p), but the EA claim rests on unproven mixing.

read the letter

The core result is that in this dueling bandit model the (1+1) EA reaches only constant probability on the Condorcet winner in stationarity once the advantage p drops to order 1/n, whereas the iteration-best Max-Min Ant System EDA keeps probability 1 minus order p on it. They also show that repeating each duel a few times lifts the EA's stationary mass on the winner. These explicit bounds for the two heuristics in the Condorcet setting look like fresh calculations not already in the cited literature.

Referee Report

2 major / 1 minor

Summary. The manuscript analyzes search heuristics in the dueling multi-armed bandit setting with a Condorcet winner. It claims that the (1+1) EA selects the Condorcet winner only with constant probability in its stationary distribution when the advantage parameter p is Ω(1/n), while a simple EDA based on the Max-Min Ant System with iteration-best update achieves selection probability 1-Θ(p). The paper also proposes repeated duels as a remedy to improve the EA's stationary probability.

Significance. If the results hold, the work is significant for evolutionary computation and bandit research by exposing limitations of standard (1+1) EAs under noisy pairwise comparisons and highlighting potential advantages of distribution-maintaining methods. It provides a theoretical lens on exploration/exploitation tradeoffs that could inform heuristic design, though the lack of mixing-time analysis weakens the practical interpretation of the stationary-distribution claims.

major comments (2)

[Analysis of (1+1) EA stationary distribution] The central claim for the (1+1) EA (constant probability of selecting the Condorcet winner in stationary distribution when p=Ω(1/n)) relies on the Markov chain over arms with transition probabilities governed by fixed duel outcomes (Condorcet winner wins with probability 1-p). No mixing-time or relaxation-time bound is provided. If mixing time scales as Ω(1/p) or worse, the distribution may remain far from stationary after polynomially many iterations, undermining the relevance of the stationary probability as a performance metric.
[Theoretical claims and derivations] The abstract and claims reference specific probability bounds and stationary distributions derived from the binary stochastic dueling model, but the manuscript provides no explicit proofs, error analysis, or verification of the Markov-chain stationary distributions. This gap is load-bearing for assessing whether the constant-probability claim for the EA and the 1-Θ(p) claim for the EDA are correct.

minor comments (1)

[Introduction and model definition] The weakest assumption (binary stochastic dueling with fixed pairwise probabilities and existence of a Condorcet winner) should be stated more explicitly at the outset, including any restrictions on how p scales with n.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our analysis of search heuristics in the dueling bandit setting. The comments correctly identify gaps in our presentation of the stationary-distribution results. We respond to each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [Analysis of (1+1) EA stationary distribution] The central claim for the (1+1) EA (constant probability of selecting the Condorcet winner in stationary distribution when p=Ω(1/n)) relies on the Markov chain over arms with transition probabilities governed by fixed duel outcomes (Condorcet winner wins with probability 1-p). No mixing-time or relaxation-time bound is provided. If mixing time scales as Ω(1/p) or worse, the distribution may remain far from stationary after polynomially many iterations, undermining the relevance of the stationary probability as a performance metric.

Authors: We agree that the lack of a mixing-time or relaxation-time analysis limits the direct applicability of the stationary-distribution claims to finite-time performance. The stationary distribution is the natural long-run measure for the inherent selection bias of the (1+1) EA under the given duel probabilities, consistent with standard practice in theoretical evolutionary computation. However, without a bound, it is possible that the chain has not mixed after polynomially many steps when p = Ω(1/n). In the revised manuscript we will add an explicit discussion paragraph acknowledging this limitation, noting that the transition structure (Condorcet winner wins with probability 1-p) suggests a polynomial mixing time via standard drift or coupling arguments, while leaving a formal bound for future work. This constitutes a partial revision. revision: partial
Referee: [Theoretical claims and derivations] The abstract and claims reference specific probability bounds and stationary distributions derived from the binary stochastic dueling model, but the manuscript provides no explicit proofs, error analysis, or verification of the Markov-chain stationary distributions. This gap is load-bearing for assessing whether the constant-probability claim for the EA and the 1-Θ(p) claim for the EDA are correct.

Authors: The manuscript derives the stationary probabilities by solving the global balance equations of the respective Markov chains. For the (1+1) EA the state space is the current arm and the transition probabilities are exactly 1-p from any non-Condorcet arm to the Condorcet winner (and p in the reverse direction), yielding a closed-form stationary mass of Θ(1) on the Condorcet winner when p = Ω(1/n). For the Max-Min Ant System EDA the probability vector is updated by the iteration-best rule, producing the claimed 1-Θ(p) mass after a single update. We will include the complete step-by-step derivations, verification that the obtained vector satisfies πP = π, and a short error-bound discussion in an expanded appendix of the revised version, making the claims fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity; stationary distributions derived directly from transition probabilities

full rationale

The paper models the (1+1) EA as a Markov chain on n arms with transitions determined by the fixed pairwise win probabilities (Condorcet winner wins with probability 1-p). The stationary distribution is obtained by solving the global balance equations under these transitions, yielding the claimed constant probability when p=Ω(1/n). The EDA result follows from its explicit per-iteration distribution update rule. No step reduces by construction to a fitted parameter, self-citation, or imported ansatz; all expressions are obtained from the standard dueling-bandit assumptions without circular redefinition or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions of the dueling bandit model and the existence of a Condorcet winner; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Binary stochastic comparisons with fixed winning probabilities between every pair of arms
Stated directly in the abstract as the setting for the analysis.
domain assumption Existence of a Condorcet winner that beats all others with probability 1-p
Central to the probability bounds derived for both algorithms.

pith-pipeline@v0.9.0 · 5540 in / 1386 out tokens · 34719 ms · 2026-05-10T17:56:38.070639+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Arpit Agarwal, Nicholas Johnson, and Shivani Agarwal. 2020. Choice Bandits. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ran- zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18399–18410. https://proceedings.neurips.cc/paper_files/paper/2020/file/ d5fcc35c94879a4afad61cacca56192c-Paper.pdf

work page 2020
[2]

Denis Antipov, Benjamin Doerr, Jiefeng Fang, and Tangi Hetet. 2018. A tight runtime analysis for the (𝜇 + 𝜆) EA. InProceedings of the Genetic and Evolutionary Computation Conference(Kyoto, Japan)(GECCO ’18). Association for Computing Machinery, 1459–1466. doi:10.1145/3205455.3205627

work page doi:10.1145/3205455.3205627 2018
[3]

Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, and Eyke Hüllermeier

work page
[4]

http://jmlr.org/papers/v22/18- 546.html

Preference-based Online Learning with Dueling Bandits: A Survey.Journal of Machine Learning Research22, 7 (2021), 1–108. http://jmlr.org/papers/v22/18- 546.html

work page 2021
[5]

Jasmin Brandt, Viktor Bengs, Björn Haddenhorst, and Eyke Hüllermeier. 2022. Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget. InAdvances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 20621–20634

work page 2022
[6]

Cox, and Christina Lioma

Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma. 2016. Multi- Dueling Bandits and Their Application to Online Ranker Evaluation. InProceed- ings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16). Association for Computing Machinery, 2161–2166. doi:10.1145/2983323.2983659

work page doi:10.1145/2983323.2983659 2016
[7]

Cox, and Christina Lioma

Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma. 2016. Multi- Dueling Bandits and Their Application to Online Ranker Evaluation. InProceed- ings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16). Association for Computing Machinery, 2161–2166

work page 2016
[8]

Wei Chen, Yihan Du, Longbo Huang, and Haoyu Zhao. 2020. Combinatorial pure exploration for dueling bandits. InProceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 143, 11 pages

work page 2020
[9]

Benjamin Doerr, Daniel Johannsen, and Carola Winzen. 2012. Multiplicative Drift Analysis.Algorithmica64, 4, 673–697. doi:10.1007/S00453-012-9622-X

work page doi:10.1007/s00453-012-9622-x 2012
[10]

Benjamin Doerr, Bojana Kodric, and Marco Voigt. 2013. Lower bounds for the runtime of a global multi-objective evolutionary algorithm. InProceedings of the IEEE Congress on Evolutionary Computation (CEC ’13). Institute of Electrical and Electronics Engineers Inc., 432–439. doi:10.1109/CEC.2013.6557601

work page doi:10.1109/cec.2013.6557601 2013
[11]

Benjamin Doerr and Frank Neumann. 2020. Theory of Evolutionary Computa- tion: Recent Developments in Discrete Optimization. InTheory of Evolutionary Computation: Recent Developments in Discrete Optimization, Benjamin Doerr and Frank Neumann (Eds.). Springer. doi:10.1007/978-3-030-29414-4

work page doi:10.1007/978-3-030-29414-4 2020
[12]

Tobias Friedrich, Timo Kötzing, Aneta Neumann, Frank Neumann, and Aish- warya Radhakrishnan. 2025. Analysis of the (1+1) EA on LeadingOnes with Constraints.Algorithmica87, 5, 661–689. doi:10.1007/S00453-025-01298-9

work page doi:10.1007/s00453-025-01298-9 2025
[13]

Björn Haddenhorst, Viktor Bengs, and Eyke Hüllermeier. 2021. Identifica- tion of the Generalized Condorcet Winner in Multi-Dueling Bandits. InAd- vances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 25904–25916. https://proceedings.neurips.cc/paper_files/paper/2021/file/ d9de6a144a3cc26cb4b3c47b206a121a-Paper.pdf

work page 2021
[14]

Jens Jägersküpper. 2008. A Blend of Markov-Chain and Drift Analysis. InParallel Problem Solving from Nature (PPSN ’08). Springer, 41–51. doi:10.1007/978-3-540- 87700-4_5

work page doi:10.1007/978-3-540- 2008
[15]

Thomas Jansen and Christine Zarges. 2011. On benefits and drawbacks of aging strategies for randomized search heuristics.Theoretical Computer Science412, 6 (2011), 543–559. doi:10.1016/j.tcs.2010.03.032

work page doi:10.1016/j.tcs.2010.03.032 2011
[16]

Martin S. Krejca. 2019.Theoretical analyses of univariate estimation-of-distribution algorithms. Doctoral Thesis. Universität Potsdam. doi:10.25932/publishup-43487

work page doi:10.25932/publishup-43487 2019
[17]

Krejca and Carsten Witt

Martin S. Krejca and Carsten Witt. 2020. Theory of Estimation-of-Distribution Algorithms. InTheory of Evolutionary Computation - Recent Developments in Discrete Optimization. Springer, 405–442. doi:10.1007/978-3-030-29414-4_9

work page doi:10.1007/978-3-030-29414-4_9 2020
[18]

Timo Kötzing. 2024. Theory of Stochastic Drift. https://arxiv.org/abs/2406.14589 Habilitation Thesis

work page arXiv 2024
[19]

Per Kristian Lehre and Xin Yao. 2012. On the Impact of Mutation-Selection Balance on the Runtime of Evolutionary Algorithms.IEEE Transactions on Evolutionary Computation16, 2 (2012), 225–241. doi:10.1109/TEVC.2011.2112665

work page doi:10.1109/tevc.2011.2112665 2012
[20]

R.D. Luce. 1959.Individual Choice Behavior: A Theoretical Analysis. Wiley. https://books.google.de/books?id=a80DAQAAIAAJ

work page 1959
[21]

2017.Probability and Computing: Random- ization and Probabilistic Techniques in Algorithms and Data Analysis(2nd ed.)

Michael Mitzenmacher and Eli Upfal. 2017.Probability and Computing: Random- ization and Probabilistic Techniques in Algorithms and Data Analysis(2nd ed.). Cambridge University Press

work page 2017
[22]

Soheil Mohajer, Changho Suh, and Adel Elmahdy. 2017. Active Learning for Top-𝐾 Rank Aggregation from Noisy Comparisons. InProceedings of the 34th International Conference on Machine Learning, Vol. 70. Proceedings of Machine Learning Research, 2488–2497. https://proceedings.mlr.press/v70/mohajer17a. html

work page 2017
[23]

Oliveto and Carsten Witt

Pietro S. Oliveto and Carsten Witt. 2011. Simplified Drift Analysis for Proving Lower Bounds in Evolutionary Computation.Algorithmica59, 3 (01 Mar 2011), 369–386. doi:10.1007/s00453-010-9387-z

work page doi:10.1007/s00453-010-9387-z 2011
[24]

R. L. Plackett. 1975. The Analysis of Permutations.Journal of the Royal Statistical Society Series C24, 2 (June 1975), 193–202. doi:10.2307/2346567

work page doi:10.2307/2346567 1975
[25]

Wenbo Ren, Jia Liu, and Ness Shroff. 2020. The Sample Complexity of Best-𝑘 Items Selection from Pairwise Comparisons. InProceedings of the 37th International Conference on Machine Learning, Vol. 119. Proceedings of Machine Learning Research, 8051–8072. https://proceedings.mlr.press/v119/ren20a.html

work page 2020
[26]

Wenbo Ren, Jia Liu, and Ness B. Shroff. 2019.On sample complexity upper and lower bounds for exact ranking from noisy comparisons. Curran Associates Inc

work page 2019
[27]

Günter Rudolph. 1998. Finite Markov Chain Results in Evolutionary Computation: A Tour d’Horizon.Fundamenta Informaticae35, 1–4 (Jan. 1998), 67–89

work page 1998
[28]

Aadirupa Saha and Aditya Gopalan. 2018. Battle of Bandits. InProceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 805–814. http://auai.org/uai2018/proceedings/papers/290.pdf

work page 2018
[29]

Aadirupa Saha and Aditya Gopalan. 2020. From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model. InProceedings of the 37th International Conference on Machine Learning, Vol. 119. Proceedings of Machine Learning Research, 8367–8376. https://proceedings.mlr.press/v119/saha20b.html

work page 2020
[30]

Dirk Sudholt. 2009. The impact of parametrization in memetic evolutionary algorithms.Theoretical Computer Science410, 26 (June 2009), 2511–2528. doi:10. 1016/j.tcs.2009.03.003

work page 2009
[31]

Dirk Sudholt. 2011. Using markov-chain mixing time estimates for the analysis of ant colony optimization. InProceedings of the 11th Workshop Proceedings on Foundations of Genetic Algorithms (FOGA ’11). Association for Computing Machinery, 139–150. doi:10.1145/1967654.1967667

work page doi:10.1145/1967654.1967667 2011
[32]

Carsten Witt. 2006. Runtime Analysis of the (𝜇+1) EA on Simple Pseudo-Boolean Functions.Evolutionary Computation14, 1 (March 2006), 65–86. doi:10.1162/ 106365606776022751

work page 2006
[33]

Carsten Witt. 2008. Population size versus runtime of a simple evolutionary algorithm.Theoretical Computer Science403, 1 (2008), 104–120. doi:10.1016/j.tcs. 2008.05.011 Analysis of Search Heuristics in the Multi-Armed Bandit Setting GECCO Companion ’26, July 13–17, 2026, San Jose, Costa Rica

work page doi:10.1016/j.tcs 2008
[34]

Huangjun Zhu, Zihao Li, and Masahito Hayashi. 2022. Nearly tight universal bounds for the binomial tail probabilities. (2022). arXiv:2211.01688 https://arxiv. org/abs/2211.01688 Acknowledgements This research was (partially) funded by the HPI Research School on Foundations of AI (FAI). This research was supported by the Ministry of Culture and Science (NR...

work page arXiv 2022

[1] [1]

Arpit Agarwal, Nicholas Johnson, and Shivani Agarwal. 2020. Choice Bandits. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ran- zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18399–18410. https://proceedings.neurips.cc/paper_files/paper/2020/file/ d5fcc35c94879a4afad61cacca56192c-Paper.pdf

work page 2020

[2] [2]

Denis Antipov, Benjamin Doerr, Jiefeng Fang, and Tangi Hetet. 2018. A tight runtime analysis for the (𝜇 + 𝜆) EA. InProceedings of the Genetic and Evolutionary Computation Conference(Kyoto, Japan)(GECCO ’18). Association for Computing Machinery, 1459–1466. doi:10.1145/3205455.3205627

work page doi:10.1145/3205455.3205627 2018

[3] [3]

Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, and Eyke Hüllermeier

work page

[4] [4]

http://jmlr.org/papers/v22/18- 546.html

Preference-based Online Learning with Dueling Bandits: A Survey.Journal of Machine Learning Research22, 7 (2021), 1–108. http://jmlr.org/papers/v22/18- 546.html

work page 2021

[5] [5]

Jasmin Brandt, Viktor Bengs, Björn Haddenhorst, and Eyke Hüllermeier. 2022. Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget. InAdvances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 20621–20634

work page 2022

[6] [6]

Cox, and Christina Lioma

Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma. 2016. Multi- Dueling Bandits and Their Application to Online Ranker Evaluation. InProceed- ings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16). Association for Computing Machinery, 2161–2166. doi:10.1145/2983323.2983659

work page doi:10.1145/2983323.2983659 2016

[7] [7]

Cox, and Christina Lioma

Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma. 2016. Multi- Dueling Bandits and Their Application to Online Ranker Evaluation. InProceed- ings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16). Association for Computing Machinery, 2161–2166

work page 2016

[8] [8]

Wei Chen, Yihan Du, Longbo Huang, and Haoyu Zhao. 2020. Combinatorial pure exploration for dueling bandits. InProceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 143, 11 pages

work page 2020

[9] [9]

Benjamin Doerr, Daniel Johannsen, and Carola Winzen. 2012. Multiplicative Drift Analysis.Algorithmica64, 4, 673–697. doi:10.1007/S00453-012-9622-X

work page doi:10.1007/s00453-012-9622-x 2012

[10] [10]

Benjamin Doerr, Bojana Kodric, and Marco Voigt. 2013. Lower bounds for the runtime of a global multi-objective evolutionary algorithm. InProceedings of the IEEE Congress on Evolutionary Computation (CEC ’13). Institute of Electrical and Electronics Engineers Inc., 432–439. doi:10.1109/CEC.2013.6557601

work page doi:10.1109/cec.2013.6557601 2013

[11] [11]

Benjamin Doerr and Frank Neumann. 2020. Theory of Evolutionary Computa- tion: Recent Developments in Discrete Optimization. InTheory of Evolutionary Computation: Recent Developments in Discrete Optimization, Benjamin Doerr and Frank Neumann (Eds.). Springer. doi:10.1007/978-3-030-29414-4

work page doi:10.1007/978-3-030-29414-4 2020

[12] [12]

Tobias Friedrich, Timo Kötzing, Aneta Neumann, Frank Neumann, and Aish- warya Radhakrishnan. 2025. Analysis of the (1+1) EA on LeadingOnes with Constraints.Algorithmica87, 5, 661–689. doi:10.1007/S00453-025-01298-9

work page doi:10.1007/s00453-025-01298-9 2025

[13] [13]

Björn Haddenhorst, Viktor Bengs, and Eyke Hüllermeier. 2021. Identifica- tion of the Generalized Condorcet Winner in Multi-Dueling Bandits. InAd- vances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 25904–25916. https://proceedings.neurips.cc/paper_files/paper/2021/file/ d9de6a144a3cc26cb4b3c47b206a121a-Paper.pdf

work page 2021

[14] [14]

Jens Jägersküpper. 2008. A Blend of Markov-Chain and Drift Analysis. InParallel Problem Solving from Nature (PPSN ’08). Springer, 41–51. doi:10.1007/978-3-540- 87700-4_5

work page doi:10.1007/978-3-540- 2008

[15] [15]

Thomas Jansen and Christine Zarges. 2011. On benefits and drawbacks of aging strategies for randomized search heuristics.Theoretical Computer Science412, 6 (2011), 543–559. doi:10.1016/j.tcs.2010.03.032

work page doi:10.1016/j.tcs.2010.03.032 2011

[16] [16]

Martin S. Krejca. 2019.Theoretical analyses of univariate estimation-of-distribution algorithms. Doctoral Thesis. Universität Potsdam. doi:10.25932/publishup-43487

work page doi:10.25932/publishup-43487 2019

[17] [17]

Krejca and Carsten Witt

Martin S. Krejca and Carsten Witt. 2020. Theory of Estimation-of-Distribution Algorithms. InTheory of Evolutionary Computation - Recent Developments in Discrete Optimization. Springer, 405–442. doi:10.1007/978-3-030-29414-4_9

work page doi:10.1007/978-3-030-29414-4_9 2020

[18] [18]

Timo Kötzing. 2024. Theory of Stochastic Drift. https://arxiv.org/abs/2406.14589 Habilitation Thesis

work page arXiv 2024

[19] [19]

Per Kristian Lehre and Xin Yao. 2012. On the Impact of Mutation-Selection Balance on the Runtime of Evolutionary Algorithms.IEEE Transactions on Evolutionary Computation16, 2 (2012), 225–241. doi:10.1109/TEVC.2011.2112665

work page doi:10.1109/tevc.2011.2112665 2012

[20] [20]

R.D. Luce. 1959.Individual Choice Behavior: A Theoretical Analysis. Wiley. https://books.google.de/books?id=a80DAQAAIAAJ

work page 1959

[21] [21]

2017.Probability and Computing: Random- ization and Probabilistic Techniques in Algorithms and Data Analysis(2nd ed.)

Michael Mitzenmacher and Eli Upfal. 2017.Probability and Computing: Random- ization and Probabilistic Techniques in Algorithms and Data Analysis(2nd ed.). Cambridge University Press

work page 2017

[22] [22]

Soheil Mohajer, Changho Suh, and Adel Elmahdy. 2017. Active Learning for Top-𝐾 Rank Aggregation from Noisy Comparisons. InProceedings of the 34th International Conference on Machine Learning, Vol. 70. Proceedings of Machine Learning Research, 2488–2497. https://proceedings.mlr.press/v70/mohajer17a. html

work page 2017

[23] [23]

Oliveto and Carsten Witt

Pietro S. Oliveto and Carsten Witt. 2011. Simplified Drift Analysis for Proving Lower Bounds in Evolutionary Computation.Algorithmica59, 3 (01 Mar 2011), 369–386. doi:10.1007/s00453-010-9387-z

work page doi:10.1007/s00453-010-9387-z 2011

[24] [24]

R. L. Plackett. 1975. The Analysis of Permutations.Journal of the Royal Statistical Society Series C24, 2 (June 1975), 193–202. doi:10.2307/2346567

work page doi:10.2307/2346567 1975

[25] [25]

Wenbo Ren, Jia Liu, and Ness Shroff. 2020. The Sample Complexity of Best-𝑘 Items Selection from Pairwise Comparisons. InProceedings of the 37th International Conference on Machine Learning, Vol. 119. Proceedings of Machine Learning Research, 8051–8072. https://proceedings.mlr.press/v119/ren20a.html

work page 2020

[26] [26]

Wenbo Ren, Jia Liu, and Ness B. Shroff. 2019.On sample complexity upper and lower bounds for exact ranking from noisy comparisons. Curran Associates Inc

work page 2019

[27] [27]

Günter Rudolph. 1998. Finite Markov Chain Results in Evolutionary Computation: A Tour d’Horizon.Fundamenta Informaticae35, 1–4 (Jan. 1998), 67–89

work page 1998

[28] [28]

Aadirupa Saha and Aditya Gopalan. 2018. Battle of Bandits. InProceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 805–814. http://auai.org/uai2018/proceedings/papers/290.pdf

work page 2018

[29] [29]

Aadirupa Saha and Aditya Gopalan. 2020. From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model. InProceedings of the 37th International Conference on Machine Learning, Vol. 119. Proceedings of Machine Learning Research, 8367–8376. https://proceedings.mlr.press/v119/saha20b.html

work page 2020

[30] [30]

Dirk Sudholt. 2009. The impact of parametrization in memetic evolutionary algorithms.Theoretical Computer Science410, 26 (June 2009), 2511–2528. doi:10. 1016/j.tcs.2009.03.003

work page 2009

[31] [31]

Dirk Sudholt. 2011. Using markov-chain mixing time estimates for the analysis of ant colony optimization. InProceedings of the 11th Workshop Proceedings on Foundations of Genetic Algorithms (FOGA ’11). Association for Computing Machinery, 139–150. doi:10.1145/1967654.1967667

work page doi:10.1145/1967654.1967667 2011

[32] [32]

Carsten Witt. 2006. Runtime Analysis of the (𝜇+1) EA on Simple Pseudo-Boolean Functions.Evolutionary Computation14, 1 (March 2006), 65–86. doi:10.1162/ 106365606776022751

work page 2006

[33] [33]

Carsten Witt. 2008. Population size versus runtime of a simple evolutionary algorithm.Theoretical Computer Science403, 1 (2008), 104–120. doi:10.1016/j.tcs. 2008.05.011 Analysis of Search Heuristics in the Multi-Armed Bandit Setting GECCO Companion ’26, July 13–17, 2026, San Jose, Costa Rica

work page doi:10.1016/j.tcs 2008

[34] [34]

Huangjun Zhu, Zihao Li, and Masahito Hayashi. 2022. Nearly tight universal bounds for the binomial tail probabilities. (2022). arXiv:2211.01688 https://arxiv. org/abs/2211.01688 Acknowledgements This research was (partially) funded by the HPI Research School on Foundations of AI (FAI). This research was supported by the Ministry of Culture and Science (NR...

work page arXiv 2022