TrailBlazer extends Monte-Carlo sampling to alternating max and expectation steps in MDPs, delivering sample-complexity bounds that scale with the number of near-optimal states rather than the full state space.
Integrating sample-based planning and model-based reinforcement learning.AAAI Conference on Artificial Intelligence
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
TrailBlazer extends Monte-Carlo sampling to alternating max and expectation steps in MDPs, delivering sample-complexity bounds that scale with the number of near-optimal states rather than the full state space.