RMDPs lack subgradient dominance in general and admit suboptimal local minima; finding epsilon-optimal policies is NP-hard for finite transition uncertainty sets, but the dominance property holds when worst-case kernels or action-values are unique per policy.
Regularized Policies are Reward Robust
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
RMDPs lack subgradient dominance in general and admit suboptimal local minima; finding epsilon-optimal policies is NP-hard for finite transition uncertainty sets, but the dominance property holds when worst-case kernels or action-values are unique per policy.