Online Learning with Feedback Graphs: Beyond Bandits

· 2015 · cs.LG · arXiv 1502.07617

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced $T$-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with $\widetilde\Theta(\alpha^{1/2} T^{1/2})$ minimax regret, where $\alpha$ is the independence number of the underlying graph; the second class induces problems with $\widetilde\Theta(\delta^{1/3}T^{2/3})$ minimax regret, where $\delta$ is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time.

representative citing papers

Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions

stat.ML · 2026-05-21 · unverdicted · novelty 7.0

In first-price auctions with feedback-only shilling, an algorithm combining robust interval elimination and optimistic debiasing with racing achieves near-optimal regret rates of O(T^{2/3}) or O(sqrt(T)) and matches a lower bound in the single-active-region case.

citing papers explorer

Showing 1 of 1 citing paper.

Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions stat.ML · 2026-05-21 · unverdicted · none · ref 3 · internal anchor
In first-price auctions with feedback-only shilling, an algorithm combining robust interval elimination and optimistic debiasing with racing achieves near-optimal regret rates of O(T^{2/3}) or O(sqrt(T)) and matches a lower bound in the single-active-region case.

Online Learning with Feedback Graphs: Beyond Bandits

fields

years

verdicts

representative citing papers

citing papers explorer