New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition , url =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Tight Sample Complexity Bounds for Entropic Best Policy Identification
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.