Robustness Analysis of POMDP Policies to Observation Perturbations
Pith reviewed 2026-05-09 22:00 UTC · model grok-4.3
The pith
The largest observation-model deviation a POMDP policy can tolerate without falling below a value threshold is located by root-finding on a monotonic bi-level optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, when policies are represented with finite-state controllers it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. The resulting Robust Interval Search algorithm is sound and convergent, with polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and scales to POMDPs with tens of thousands of states.
What carries the argument
Bi-level optimization of the Policy Observation Robustness Problem, where the inner level computes worst-case policy value for a fixed deviation magnitude and is monotonic in that magnitude.
If this is right
- Robust Interval Search supplies soundness and convergence guarantees for computing the maximum tolerable deviation in both sticky and non-sticky variants.
- The non-sticky variant admits a polynomial-time algorithm when policies are finite-state controllers.
- The sticky variant requires at most exponential time but remains practical for large state spaces.
- Experimental implementations scale to POMDP models containing tens of thousands of states.
- Case studies in robotics and operations research confirm the problem formulation yields actionable robustness margins.
Where Pith is reading between the lines
- The same monotonicity structure could be used to certify robustness when both transition and observation models are allowed to deviate simultaneously.
- Designers could use the computed tolerance to set explicit sensor-calibration budgets or maintenance schedules before deployment.
- Similar reductions from history-dependent to compact-memory observations may apply to other policy classes such as recurrent neural networks.
- If approximate monotonicity holds for learned or black-box policies, root-finding could still provide practical robustness certificates.
Load-bearing premise
The worst-case value achieved by the policy under observation deviations decreases monotonically as the allowed size of those deviations grows.
What would settle it
A concrete counter-example in which, for some fixed policy and POMDP, increasing the maximum allowed observation deviation produces a higher (or equal) worst-case value instead of a strictly lower one.
Figures
read the original abstract
Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP's observation model that guarantees the policy's value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Policy Observation Robustness Problem: given a POMDP policy and value threshold, find the largest deviation bound on the observation model such that the policy's value remains above the threshold under any deviation of that magnitude. It formulates the problem as a bi-level optimization (outer search over deviation magnitude, inner worst-case value computation) and proves that the inner problem is monotonic in the deviation size due to nested feasible deviation sets. For the non-sticky (history-dependent) variant, when policies are finite-state controllers, it suffices to consider node-dependent observation perturbations rather than full histories. The authors present the Robust Interval Search algorithm, prove its soundness and convergence, establish polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and report experiments on instances with tens of thousands of states plus robotics and operations-research case studies.
Significance. If the monotonicity property and algorithm guarantees hold, the work offers a practical, theoretically grounded method for quantifying observation-model robustness of POMDP policies, addressing a gap between nominal design and deployment under sensor drift or calibration errors. The reduction to node-dependent perturbations for FSCs is a clean application of finite-memory arguments, and the explicit polynomial/exponential complexity bounds together with scaling experiments on large instances are concrete strengths. The bi-level root-finding approach and case studies demonstrate both algorithmic efficiency and applied relevance.
minor comments (3)
- [§2] §2 (Preliminaries): The distinction between sticky and non-sticky variants is introduced via informal description; a formal definition of the deviation sets (e.g., as functions of (s,a) versus histories) should appear before the bi-level formulation to avoid ambiguity for readers.
- [§4] §4 (Robust Interval Search): While soundness and convergence are claimed, the proof sketch for the non-sticky FSC reduction (that node-dependent observations preserve the worst-case value) would benefit from an explicit lemma stating the equivalence of the history-dependent and node-dependent inner problems.
- [Experiments] Experimental section: The scaling results mention instances with tens of thousands of states but do not report wall-clock times, memory usage, or direct comparison against a naive grid search or MILP baseline; adding these would strengthen the scalability claim.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work, recognition of its significance, and recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper formulates the Policy Observation Robustness Problem as a bi-level optimization whose inner problem (worst-case value under bounded observation deviation) is monotonic in deviation magnitude. This monotonicity follows directly from the nested structure of feasible deviation sets: any deviation feasible for magnitude ε is feasible for ε' > ε, so the minimized value is non-increasing. This is a logical consequence of the problem definition using standard POMDP value functions, not a self-definition, fitted parameter, or self-citation. The non-sticky FSC reduction (limiting to node-dependent observations) is a standard finite-memory preservation argument. The Robust Interval Search algorithm's soundness, convergence, and complexity bounds are derived from these properties without reducing to prior self-work or redefinitions. The derivation is self-contained against external POMDP theory and benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The inner optimization over observation deviations is monotonic in the deviation magnitude.
Reference graph
Works this paper leans on
-
[1]
Turgay Ayer, Oguzhan Alagoz, and Natasha K. Stout. 2012. OR Forum—A POMDP Approach to Personalize Mammography Screening Decisions.Operations Research60, 5 (Oct. 2012), 1019–1034. https://doi.org/10.1287/opre.1110.1019 Publisher: INFORMS
-
[2]
Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo. 2011.Monte Carlo Value Iteration for Continuous-State POMDPs. Springer Berlin Heidelberg, Berlin, Heidelberg, 175–191. https://doi.org/10.1007/978-3-642-17452-0_11
-
[3]
Christel Baier, Christian Hensel, Lisa Hutschenreiter, Sebastian Junges, Joost-Pieter Katoen, and Joachim Klein. 2020. Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination.Information and Computation272 (2020), 104504. https: //doi.org/10.1016/j.ic.2019.104504 GandALF 2017
-
[4]
Somrita Banerjee, Edward Balaban, Mark Shirley, Kevin Bradner, and Marco Pavone. 2024. Contingency Planning Using Bi-level Markov Decision Processes for Space Missions. In2024 IEEE Aerospace Conference. 1–9. https://doi.org/10.1109/AERO58975.2024.10521281
-
[5]
2010.Dynamic Programming
Richard Bellman and Stuart Dreyfus. 2010.Dynamic Programming. Vol. 33. Princeton University Press. http://www.jstor.org/stable/j. ctv1nxcw0f
2010
-
[6]
Alexander Bork, Sebastian Junges, Joost-Pieter Katoen, and Tim Quatmann. 2020. Verification of Indefinite-Horizon POMDPs. In Automated Technology for Verification and Analysis, Dang Van Hung and Oleg Sokolsky (Eds.). Springer International Publishing, Cham, 288–304. https://doi.org/10.1007/978-3-030-59152-6_16
-
[7]
Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen
Eline M. Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen. 2024. Imprecise probabilities meet partial observability: game semantics for robust POMDPs. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence(Jeju, Korea)(IJCAI ’24). Article 740, 10 pages. https://doi.org/10.24963/ijcai.2024/740
-
[8]
2001.Numerical Analysis 7th Edition
Richard L Burden and J Douglas Faires. 2001.Numerical Analysis 7th Edition. Thomson Learning, Chapter 2
2001
-
[9]
Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. 2015. Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In2015 IEEE International Conference on Robotics and Automation (ICRA). 325–330. https://doi.org/10.1109/ICRA.2015.7139019 ISSN: 1050-4729
-
[10]
Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, and Siddhartha Srinivasa. 2020. Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning.J. Hum.-Robot Interact.9, 2, Article 9 (Jan. 2020), 23 pages. https://doi.org/10.1145/3359616
-
[11]
Anthony Colaprete, Daniel Andrews, William Bluethmann, Richard C Elphic, Ben Bussey, Jay Trimble, Kris Zacny, and Janine E Captain
-
[12]
InAGU fall meeting abstracts, Vol
An overview of the volatiles investigating polar exploration rover (viper) mission. InAGU fall meeting abstracts, Vol. 2019. P34B–03
2019
-
[13]
Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex optimization for parameter synthesis in MDPs.IEEE Trans. Automat. Control67, 12 (2021), 6333–6348
2021
-
[14]
Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, and Ufuk Topcu. 2021. Robust finite-state controllers for uncertain POMDPs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11792–11800
2021
-
[15]
Conrado Daws. 2004. Symbolic and parametric model checking of discrete-time Markov chains. InInternational Colloquium on Theoretical Aspects of Computing. Springer, 280–294
2004
-
[16]
Christian Dehnert, Sebastian Junges, Nils Jansen, Florian Corzilius, Matthias Volk, Harold Bruintjes, Joost-Pieter Katoen, and Erika Ábrahám. 2015. PROPhESY: A PRObabilistic ParamEter SYnthesis Tool. InComputer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Vol. 9206. Springer International Publishing, Cham, 214–231. https://doi.org/1...
-
[17]
Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. 2017. A Storm is Coming: A Modern Probabilistic Model Checker. InComputer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer International Publishing, Cham, 592–600. https://doi.org/10.1007/978-3-319-63390-9_31
- [18]
-
[19]
Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. 2017. POMDPs. jl: A framework for sequential decision making under uncertainty.The Journal of Machine Learning Research18, 1 (2017), 831–835
2017
-
[20]
Mahdi Milani Fard, Joelle Pineau, and Peng Sun. 2008. A Variance Analysis for POMDP Policy Evaluation.Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence2 (Jan. 2008), 1056–1061. https://doi.org/10.1901/jaba.2008.2- 1056 33
-
[21]
Maris FL Galesloot, Marnix Suilen, Thiago D Simão, Steven Carr, Matthijs TJ Spaan, Ufuk Topcu, and Nils Jansen. 2025. Pessimistic Iterative Planning with RNNs for Robust POMDPs. InECAI 2025. IOS Press, 4823–4831
2025
-
[22]
Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes.Artificial Intelligence122, 1 (Sep 2000), 71–109. https://doi.org/10.1016/S0004-3702(00)00047-3
-
[23]
Himanshu Gupta, Bradley Hayes, and Zachary Sunberg. 2022. Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems(Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 562–570
2022
-
[24]
1997.Planning and control in stochastic domains with imperfect information
Milos Hauskrecht. 1997.Planning and control in stochastic domains with imperfect information. Ph. D. Dissertation. Massachusetts Institute of Technology
1997
-
[25]
Heldmann, Anthony Colaprete, Richard C
Jennifer L. Heldmann, Anthony Colaprete, Richard C. Elphic, Ben Bussey, Andrew McGovern, Ross Beyer, David Lees, and Matt Deans
-
[26]
https://doi.org/10.1016/j.actaastro.2016.06.014
Site selection and traverse planning to support a lunar polar rover mission: A case study at Haworth Crater.Acta Astronautica127 (2016), 308–320. https://doi.org/10.1016/j.actaastro.2016.06.014
-
[27]
Lisa Hutschenreiter, Christel Baier, and Joachim Klein. 2017. Parametric Markov Chains: PCTL Complexity and Fraction-free Gaussian Elimination.Electronic Proceedings in Theoretical Computer Science256 (Sept. 2017), 16–30. https://doi.org/10.4204/EPTCS.256.2 arXiv:1709.02093 [cs]
-
[28]
Garud N Iyengar. 2005. Robust dynamic programming.Mathematics of Operations Research30, 2 (2005), 257–280
2005
-
[29]
B. Jonsson and K.G. Larsen. 1991. Specification and refinement of probabilistic processes. In[1991] Proceedings Sixth Annual IEEE Symposium on Logic in Computer Science. 266–277. https://doi.org/10.1109/LICS.1991.151651
-
[30]
Sebastian Junges, Nils Jansen, Ralf Wimmer, Tim Quatmann, Leonore Winterer, Joost-Pieter Katoen, and Bernd Becker. 2018. Finite-state Controllers of POMDPs via Parameter Synthesis.Conference on Uncertainty in Artifical Intelligence(2018)
2018
-
[31]
Sebastian Junges, Erika Ábrahám, Christian Hensel, Nils Jansen, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. 2024. Parameter synthesis for Markov models: covering the parameter space.Formal Methods in System Design62, 1 (June 2024), 181–259. https: //doi.org/10.1007/s10703-023-00442-x
-
[32]
Littman, and Anthony R
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains.Artificial Intelligence101, 1 (1998), 99–134
1998
-
[33]
Ali Devran Kara and Serdar Yüksel. 2021. Robustness to Approximations and Model Learning in MDPs and POMDPs. InModern Trends in Controlled Stochastic Processes:, Alexey Piunovskiy and Yi Zhang (Eds.). Springer International Publishing, Cham, 166–191. https://doi.org/10.1007/978-3-030-76928-4_9
-
[34]
2015.Decision making under uncertainty: theory and application
Mykel J Kochenderfer. 2015.Decision making under uncertainty: theory and application. MIT press, Chapter 6
2015
- [35]
-
[36]
Benjamin D. Kraske, Anshu Saksena, Anna L. Buczak, and Zachary N. Sunberg. 2023. Explanation Through Reward Model Reconciliation using POMDP Tree Search. In2023 IEEE International Conference on Assured Autonomy (ICAA). 137–140. https://doi.org/10.1109/ ICAA58325.2023.00027
-
[37]
Hanna Kurniawati, David Hsu, and Wee Sun Lee. 2008. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. InProc. Robotics: Science and Systems
2008
-
[38]
Mikko Lauri, David Hsu, and Joni Pajarinen. 2023. Partially Observable Markov Decision Processes in Robotics: A Survey.IEEE Transactions on Robotics39, 1 (2023), 21–40. https://doi.org/10.1109/TRO.2022.3200138
-
[39]
Omid Madani, Steve Hanks, and Anne Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems.Artificial Intelligence147, 1 (2003), 5–34. https://doi.org/10.1016/S0004-3702(02)00378-8 Planning with Uncertainty and Incomplete Information
-
[40]
Andrew Mastin and Patrick Jaillet. 2012. Loss bounds for uncertain transition probabilities in Markov decision processes. In2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 6708–6715. https://doi.org/10.1109/CDC.2012.6426504 ISSN: 0743-1546
-
[41]
Frederik Baymler Mathiesen, Morteza Lahijanian, and Luca Laurenti. 2024. IntervalMDP.jl: Accelerated Value Iteration for Interval Markov Decision Processes.IFAC-PapersOnLine58, 11 (2024), 1–6. https://doi.org/10.1016/j.ifacol.2024.07.416 8th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 2024
- [42]
-
[43]
Cassandra
Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra. 1999. Solving POMDPs by searching the space of finite policies. InProceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(Stockholm, Sweden)(UAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 417–426
1999
-
[44]
Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices.Operations Research53, 5 (2005), 780–798
2005
-
[45]
Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems.Real-Time Systems53, 3 (2017), 354–402. 34
2017
-
[46]
Wenfan Ou and Sheng Bi. 2025. Sequential decision-making under uncertainty: a robust MDPs review.Annals of Operations Research 353, 3 (2025), 1239–1285
2025
-
[47]
Michael P. Owen, Adam Panken, Robert Moss, Luis Alvarez, and Charles Leeper. 2019. ACAS Xu: Integrated Collision Avoidance and Detect and Avoid Capability for UAS. In2019 IEEE/AIAA 38th Digital A vionics Systems Conference (DASC). 1–10. https://doi.org/10.1109/ DASC43569.2019.9081758
-
[48]
Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes.Mathematics of Operations Research12, 3 (1987), 441–450. http://www.jstor.org/stable/3689975
-
[49]
Joelle Pineau, Geoff Gordon, Sebastian Thrun, et al. 2003. Point-based value iteration: An anytime algorithm for POMDPs. InIjcai, Vol. 3. 1025–1032
2003
-
[50]
2014.Markov decision processes: discrete stochastic dynamic programming
Martin L Puterman. 2014.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons
2014
-
[51]
Tim Quatmann, Christian Dehnert, Nils Jansen, Sebastian Junges, and Joost-Pieter Katoen. 2016. Parameter Synthesis for Markov Models: Faster Than Ever. InAutomated Technology for Verification and Analysis, Cyrille Artho, Axel Legay, and Doron Peled (Eds.). Springer International Publishing, Cham, 50–67
2016
-
[52]
Alex Schutz, Yang You, Matías Mattamala, Ipek Caliskanelli, Bruno Lacerda, and Nick Hawes. 2025. A Finite-State Controller Based Offline Solver for Deterministic POMDPs. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, James Kwok (Ed.). International Joint Conferences on Artificial Intelligence Organi...
2025
-
[53]
Prashin Sharma, Benjamin Kraske, Joseph Kim, Zakariya Laouar, Zachary Sunberg, and Ella Atkins. 2024. Risk-Aware markov decision process contingency management autonomy for uncrewed aircraft systems.Journal of aerospace information systems21, 3 (2024), 234–248
2024
-
[54]
David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. InAdvances in Neural Information Processing Systems, Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html
2010
-
[55]
Trey Smith and Reid Simmons. 2004. Heuristic search value iteration for POMDPs. InProceedings of the 20th conference on Uncertainty in artificial intelligence. 520–527
2004
-
[56]
Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research26, 2 (1978), 282–304. http://www.jstor.org/stable/169635
1978
-
[57]
Jip Spel, Sebastian Junges, and Joost-Pieter Katoen. 2021. Finding Provably Optimal Markov Chains. InTools and Algorithms for the Construction and Analysis of Systems, Jan Friso Groote and Kim Guldstrand Larsen (Eds.). Springer International Publishing, Cham, 173–190. https://doi.org/10.1007/978-3-030-72016-2_10
-
[58]
Marnix Suilen, Nils Jansen, Murat Cubuktepe, and Ufuk Topcu. 2020. Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4113–4120. https:/...
- [59]
-
[60]
White and Hany K
Chelsea C. White and Hany K. El-Deib. 1986. Parameter Imprecision in Finite State, Finite Action Dynamic Programs.Operations Research34, 1 (1986), 120–129. https://www.jstor.org/stable/170676 Publisher: INFORMS
1986
-
[61]
Nan Ye, Adhiraj Somani, David Hsu, and Wee Sun Lee. 2017. DESPOT: Online POMDP Planning with Regularization.Journal of Artificial Intelligence Research58 (Jan. 2017), 231–266. https://doi.org/10.1613/jair.5328
-
[62]
Serdar Yüksel and Tamás Linder. 2012. Optimization and Convergence of Observation Channels in Stochastic Control.SIAM Journal on Control and Optimization50, 2 (2012), 864–887. https://doi.org/10.1137/100808976 arXiv:https://doi.org/10.1137/100808976 35 A Proof of Theorem 3 Theorem 3.For a history-dependent policy represented as an FSC 𝜋, the history-depen...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.