Parametric MDPs enable PAC uncertainty models for MDPs by projecting empirical frequencies onto parameter space with polytopic outer approximations, yielding tighter estimates than independent interval methods.
Pérez, and Marnix Suilen
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Shielding the policy improvement process in offline RL yields policies that are safe with high probability while outperforming unshielded baselines in both average and worst-case performance, especially under limited data.
citing papers explorer
-
Robust Parameter Learning for Uncertain MDPs
Parametric MDPs enable PAC uncertainty models for MDPs by projecting empirical frequencies onto parameter space with polytopic outer approximations, yielding tighter estimates than independent interval methods.
-
Robust Probabilistic Shielding for Safe Offline Reinforcement Learning
Shielding the policy improvement process in offline RL yields policies that are safe with high probability while outperforming unshielded baselines in both average and worst-case performance, especially under limited data.