pith. sign in

arxiv: 2602.10971 · v2 · pith:NRU7QQBOnew · submitted 2026-02-11 · 💻 cs.LG · stat.ML

A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

classification 💻 cs.LG stat.ML
keywords banditsheteroskedasticadversarialcorruptionslinearcorruptionkappastar
0
0 comments X
read the original abstract

We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes heteroskedastic linear bandits and logistic/Poisson bandits, in the presence of adversarial corruptions. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{{O}}\left( d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d (g_{\max} + \kappa) C \right)$, where $\dot{\mu}_{t,\star}$ is the slope of $\mu$ around the optimal arm at time $t$, $g(\tau_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(\tau_t) = \sigma_t^2$ for heteroskedastic linear bandits, $g(\tau_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(\tau_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tilde{\Omega}(d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $\kappa$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

    cs.LG 2026-06 unverdicted novelty 5.0

    Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.