Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

· 2025 · cs.LG · arXiv 2502.00204

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform previous results on numerical simulations.

representative citing papers

Regret Minimization in Single-Dimensional Contract-Design with Binary Actions

cs.GT · 2026-06-04 · unverdicted · novelty 7.0

Derives tight Θ(T^{2/3}) regret independent of outcome count m for adversarial agent types and Õ(√T) regret via explore-then-commit for fixed hidden type in single-dimensional binary-action contract design.

citing papers explorer

Showing 1 of 1 citing paper.

Regret Minimization in Single-Dimensional Contract-Design with Binary Actions cs.GT · 2026-06-04 · unverdicted · none · ref 9 · internal anchor
Derives tight Θ(T^{2/3}) regret independent of outcome count m for adversarial agent types and Õ(√T) regret via explore-then-commit for fixed hidden type in single-dimensional binary-action contract design.

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

fields

years

verdicts

representative citing papers

citing papers explorer