pith. sign in

arxiv: 1902.01931 · v1 · pith:KOXAKOZVnew · submitted 2019-01-21 · 💻 cs.NI · cs.LG· stat.ML

Parallel Contextual Bandits in Wireless Handover Optimization

classification 💻 cs.NI cs.LGstat.ML
keywords contextualsamplingthompsonbanditbasemethodsoptimizationparallel
0
0 comments X
read the original abstract

As cellular networks become denser, a scalable and dynamic tuning of wireless base station parameters can only be achieved through automated optimization. Although the contextual bandit framework arises as a natural candidate for such a task, its extension to a parallel setting is not straightforward: one needs to carefully adapt existing methods to fully leverage the multi-agent structure of this problem. We propose two approaches: one derived from a deterministic UCB-like method and the other relying on Thompson sampling. Thanks to its bayesian nature, the latter is intuited to better preserve the exploration-exploitation balance in the bandit batch. This is verified on toy experiments, where Thompson sampling shows robustness to the variability of the contexts. Finally, we apply both methods on a real base station network dataset and evidence that Thompson sampling outperforms both manual tuning and contextual UCB.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.