A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.
A Variant of Azuma's Inequality for Martingales with Subgaussian Tails
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
We provide a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail.
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.
citing papers explorer
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.
-
On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.