Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.
Near-optimal time and sample complexities for solving discounted markov decision process with a generative model
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.
citing papers explorer
-
On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.
-
Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures
State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.