Extends MVP to contextual action-set RL and derives minimax regret bound O~(sqrt(S A H^3 K log L)) for adversarial contexts plus a gap-dependent bound.
Sharp gap-dependent variance-aware regret bounds for tabular MDPs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning
Extends MVP to contextual action-set RL and derives minimax regret bound O~(sqrt(S A H^3 K log L)) for adversarial contexts plus a gap-dependent bound.