Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

Ali Asadi; Alipasha Montaseri; Carlo Pagano; Ehsan Goharshady; Krishnendu Chatterjee; Mehrdad Karrabi

arxiv: 2601.23229 · v2 · pith:LCMZXK7Inew · submitted 2026-01-30 · 💻 cs.AI · cs.CC

Strongly Polynomial Time Complexity of Policy Iteration for L_infty Robust MDPs

Ali Asadi , Krishnendu Chatterjee , Ehsan Goharshady , Mehrdad Karrabi , Alipasha Montaseri , Carlo Pagano This is my paper

classification 💻 cs.AI cs.CC

keywords mdpsrmdpstimediscountfactorfundamentalinftymodel

0 comments

read the original abstract

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets
cs.CC 2026-05 unverdicted novelty 7.0

Policy iteration for discounted robust MDPs is strongly polynomial for L1 and L∞ uncertainty sets but hard for other Lp sets.