Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.
Prediction, learning, and games
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
ExAUL converts any bandit algorithm's regret into an O(sqrt(T)) FDR bound for online conformal abstention under partial adversarial feedback via a conversion lemma and feedback unlocking.
citing papers explorer
-
Non-Asymptotic Pure Exploration by Solving Games
Game-solving algorithms using no-regret learners achieve non-asymptotic optimality guarantees for pure exploration in exponential family bandits.
-
Online Conformal Abstention for Factuality Control Under Adversarial Bandit Feedback
ExAUL converts any bandit algorithm's regret into an O(sqrt(T)) FDR bound for online conformal abstention under partial adversarial feedback via a conversion lemma and feedback unlocking.