A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions
read the original abstract
We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes heteroskedastic linear bandits and logistic/Poisson bandits, in the presence of adversarial corruptions. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{{O}}\left( d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d (g_{\max} + \kappa) C \right)$, where $\dot{\mu}_{t,\star}$ is the slope of $\mu$ around the optimal arm at time $t$, $g(\tau_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(\tau_t) = \sigma_t^2$ for heteroskedastic linear bandits, $g(\tau_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(\tau_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tilde{\Omega}(d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $\kappa$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts
Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.