Near-optimal time and sample complexities for solving discounted markov decision process with a generative model

Aaron Sidford, Mengdi Wang, Xian Wu, Lin F Yang, Yinyu Ye · 2018 · arXiv 1806.01492

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.

Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures

math.OC · 2026-04-06 · unverdicted · novelty 4.0

State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.

citing papers explorer

Showing 2 of 2 citing papers.

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization cs.LG · 2026-05-04 · unverdicted · none · ref 47
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.
Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures math.OC · 2026-04-06 · unverdicted · none · ref 29
State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.

Near-optimal time and sample complexities for solving discounted markov decision process with a generative model

fields

years

verdicts

representative citing papers

citing papers explorer