Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.
A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes heteroskedastic linear bandits and logistic/Poisson bandits, in the presence of adversarial corruptions. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{{O}}\left( d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d (g_{\max} + \kappa) C \right)$, where $\dot{\mu}_{t,\star}$ is the slope of $\mu$ around the optimal arm at time $t$, $g(\tau_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(\tau_t) = \sigma_t^2$ for heteroskedastic linear bandits, $g(\tau_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(\tau_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tilde{\Omega}(d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $\kappa$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts
Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.