Algorithms achieve O(T^{1/2}) regret in contextual Stackelberg games via reduction to linear contextual bandits, improving on prior O(T^{2/3}) rates.
Improved algorithms for linear stochastic bandits
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.
citing papers explorer
-
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
Algorithms achieve O(T^{1/2}) regret in contextual Stackelberg games via reduction to linear contextual bandits, improving on prior O(T^{2/3}) rates.
-
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.