Recognition: unknown
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
read the original abstract
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm's ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm's actions. We define the alternative notion of policy regret, which attempts to provide a more meaningful way to measure an online algorithm's performance against adaptive adversaries. Focusing on the online bandit setting, we show that no bandit algorithm can guarantee a sublinear policy regret against an adaptive adversary with unbounded memory. On the other hand, if the adversary's memory is bounded, we present a general technique that converts any bandit algorithm with a sublinear regret bound into an algorithm with a sublinear policy regret bound. We extend this result to other variants of regret, such as switching regret, internal regret, and swap regret.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Mechanism Design for Decentralized Risk Detection: Strict Propriety, Network Coalitions, and the Backfiring Mandat
TVA implements truthful Bayes-Nash reporting in large federations and shows competitive pressure amplifies moral hazard such that forced sharing without incentives can lower welfare below autarky.
-
Mechanism Design for Decentralized Risk Detection: Strict Propriety, Network Coalitions, and the Backfiring Mandat
TVA mechanism implements truthful posterior reporting as Bayes-Nash equilibrium in decentralized risk detection, with network Shapley coalition analysis and welfare ordering showing backfiring mandates under certain c...
-
Learning the Preferences of a Learning Agent
Formalizes preference learning from a no-regret or Boltzmann-converging learner with theoretical guarantees or impossibility results for IRL algorithms.
-
Mechanism Design for Decentralized Risk Detection: Strict Propriety, Network Coalitions, and the Backfiring Mandat
A dynamic mechanism design framework with strictly proper scoring rules implements truthful reporting in decentralized risk detection among competing firms and identifies conditions for backfiring regulatory mandates.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.