RepUCB and RepLinUCB deliver replicable regret bounds O(K² log²T / ρ² ⋅ sum) for MAB and Õ((d + d³/ρ)√T) for linear bandits, improving the prior best by O(d/ρ) via optimistic exploration and a new replicable ridge estimator.
12 A Replicable Multi Armed Bandit Proof Theorem3.1Consider RepUCB (Algorithm
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Replicable Bandits with UCB based Exploration
RepUCB and RepLinUCB deliver replicable regret bounds O(K² log²T / ρ² ⋅ sum) for MAB and Õ((d + d³/ρ)√T) for linear bandits, improving the prior best by O(d/ρ) via optimistic exploration and a new replicable ridge estimator.