Hierarchical Adversarial Bandits for Online Configuration Optimization

Chen Avin; Gil Shabat; Hanan Shteingart; Roey Yadgar; Shie Mannor; Zvi Lotker

read the original abstract

Motivated by Online Configuration Optimization in large, dynamic parameter spaces, this work studies the nonstochastic multi-armed bandit (MAB) problem in metric action spaces with oblivious Lipschitz adversaries. We propose ABoB (Adversarial Bandit over Bandits), a hierarchical framework that decomposes the configuration space into clusters to accelerate learning and adapt to changing environments. We evaluate ABoB using standard algorithms such as EXP3 and Tsallis-INF on a real-world production storage system, demonstrating significant performance gains of up to $50\%$ compared to state-of-the-art "flat" bandit algorithms. Extensive simulations further confirm that ABoB effectively exploits metric structures, achieving up to $91\%$ improvement in adversarial metric scenarios while significantly reducing computational running time. Theoretical analysis grounds this empirical success: we prove that ABoB maintains a worst-case "safety net" bound of $O(\sqrt{kT})$, matching traditional methods, where $T$ is the number of rounds and $k$ is the number of arms, while capable of accelerating learning to $O(k^{1/4}\sqrt{T})$ under favorable Lipschitz conditions. This combination of operational efficiency and theoretical soundness makes ABoB a practical solution for automated system tuning.

Hierarchical Adversarial Bandits for Online Configuration Optimization

discussion (0)