Non-trivial two-armed partial-monitoring games are bandits
classification
💻 cs.LG
keywords
gamegamespartial-monitoringactionsadversaryavailablebandit-likebandits
read the original abstract
We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.