Algorithms for Differentially Private Multi-Armed Bandits
read the original abstract
We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(\epsilon, \delta)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O(\epsilon^{-1} + \log T)$. This is a significant improvement over previous results, which only achieve poly-log regret $O(\epsilon^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces
CHRONOS is a three-layer system for evolving data marketplaces that applies neural-ODE temporal decay, changepoint-aware Shapley valuation, and EXP3-IX private coordination to achieve 0.937 recall, 2.74 qps, 161 ms la...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.