In the dueling bandit setting, the (1+1) EA selects the Condorcet winner with only constant probability when its advantage is Ω(1/n), while a Max-Min Ant System EDA selects it with probability 1-Θ(p), and repeated duels improve the EA's performance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Analysis of Search Heuristics in the Multi-Armed Bandit Setting
In the dueling bandit setting, the (1+1) EA selects the Condorcet winner with only constant probability when its advantage is Ω(1/n), while a Max-Min Ant System EDA selects it with probability 1-Θ(p), and repeated duels improve the EA's performance.