Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.
Improved bounds for private and robust alignment
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.