The first study of unlearning in offline stochastic multi-armed bandits formalizes privacy constraints and delivers adaptive algorithms with performance guarantees and lower bounds for single- and multi-source scenarios under fixed-sample and distribution models.
arXiv preprint arXiv:2007.03121 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
Differential privacy in policy optimization adds sample complexity costs that often appear as lower-order terms rather than dominating the bounds.
Derives privacy-dependent lower bounds for fixed-confidence BAI and gives asymptotically optimal DP Top-Two algorithms for local and global models.
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.
citing papers explorer
-
Unlearning Offline Stochastic Multi-Armed Bandits
The first study of unlearning in offline stochastic multi-armed bandits formalizes privacy constraints and delivers adaptive algorithms with performance guarantees and lower bounds for single- and multi-source scenarios under fixed-sample and distribution models.
-
On the Sample Complexity of Differentially Private Policy Optimization
Differential privacy in policy optimization adds sample complexity costs that often appear as lower-order terms rather than dominating the bounds.
-
Differentially Private Best-Arm Identification
Derives privacy-dependent lower bounds for fixed-confidence BAI and gives asymptotically optimal DP Top-Two algorithms for local and global models.
-
When Determinants Are Not Enough: Private Rare Switching
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.