pith. sign in

arxiv: 2604.21256 · v1 · submitted 2026-04-23 · 💻 cs.AI

Robustness Analysis of POMDP Policies to Observation Perturbations

Pith reviewed 2026-05-09 22:00 UTC · model grok-4.3

classification 💻 cs.AI
keywords POMDPpolicy robustnessobservation perturbationsbi-level optimizationfinite-state controllerssensor degradationrobustness analysispartially observable Markov decision process
0
0 comments X

The pith

The largest observation-model deviation a POMDP policy can tolerate without falling below a value threshold is located by root-finding on a monotonic bi-level optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Policy Observation Robustness Problem as the task of finding the biggest allowable change to a POMDP's observation probabilities such that a fixed policy still achieves at least a chosen expected reward. It reformulates the problem as an outer search over deviation magnitudes whose inner step computes the worst-case value under that magnitude. Because the inner value is monotonic in the deviation size, ordinary root-finding methods converge to the desired tolerance. The authors distinguish a sticky variant, where deviations depend only on state and action, from a non-sticky variant that can depend on full histories, and show that finite-state controller policies reduce the non-sticky case to node-dependent observations. This matters for real deployments where sensors drift or degrade, because a policy verified only on the nominal model can otherwise produce unexpectedly poor performance.

Core claim

The Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, when policies are represented with finite-state controllers it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. The resulting Robust Interval Search algorithm is sound and convergent, with polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and scales to POMDPs with tens of thousands of states.

What carries the argument

Bi-level optimization of the Policy Observation Robustness Problem, where the inner level computes worst-case policy value for a fixed deviation magnitude and is monotonic in that magnitude.

If this is right

  • Robust Interval Search supplies soundness and convergence guarantees for computing the maximum tolerable deviation in both sticky and non-sticky variants.
  • The non-sticky variant admits a polynomial-time algorithm when policies are finite-state controllers.
  • The sticky variant requires at most exponential time but remains practical for large state spaces.
  • Experimental implementations scale to POMDP models containing tens of thousands of states.
  • Case studies in robotics and operations research confirm the problem formulation yields actionable robustness margins.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same monotonicity structure could be used to certify robustness when both transition and observation models are allowed to deviate simultaneously.
  • Designers could use the computed tolerance to set explicit sensor-calibration budgets or maintenance schedules before deployment.
  • Similar reductions from history-dependent to compact-memory observations may apply to other policy classes such as recurrent neural networks.
  • If approximate monotonicity holds for learned or black-box policies, root-finding could still provide practical robustness certificates.

Load-bearing premise

The worst-case value achieved by the policy under observation deviations decreases monotonically as the allowed size of those deviations grows.

What would settle it

A concrete counter-example in which, for some fixed policy and POMDP, increasing the maximum allowed observation deviation produces a higher (or equal) worst-case value instead of a strictly lower one.

Figures

Figures reproduced from arXiv: 2604.21256 by Benjamin Kraske, Federico Rossi, Morteza Lahijanian, Qi Heng Ho, Zachary Sunberg.

Figure 1
Figure 1. Figure 1: Solution approach to the Policy Observation Robustness Problem. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rover environment and FSC [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Markov chain representations or 𝑍˜ at a given step. We then show inductively for the finite-horizon case that, given any history ℎ and state 𝑠, both the 𝑍ˆ and 𝑍˜ cases have the same value at horizon 𝑑 = 1 and minimize over the same domain, therefore achieving the same value. In the infinite-horizon case, we show both Bellman updates converge to fixed points. Given the finite-horizon equivalence of the two… view at source ↗
Figure 4
Figure 4. Figure 4: Differences in path frequencies between the nominal and worst-case (minimizing) two-step Markov chains for a case [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Policy and results for the cancer case study. [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FSCs for the Part Checking POMDP. Note that the initial measure node is returned to after accepting or rejecting a [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Policy 1 and Policy 2 for the Part Checking POMDP. [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗
read the original abstract

Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP's observation model that guarantees the policy's value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces the Policy Observation Robustness Problem: given a POMDP policy and value threshold, find the largest deviation bound on the observation model such that the policy's value remains above the threshold under any deviation of that magnitude. It formulates the problem as a bi-level optimization (outer search over deviation magnitude, inner worst-case value computation) and proves that the inner problem is monotonic in the deviation size due to nested feasible deviation sets. For the non-sticky (history-dependent) variant, when policies are finite-state controllers, it suffices to consider node-dependent observation perturbations rather than full histories. The authors present the Robust Interval Search algorithm, prove its soundness and convergence, establish polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and report experiments on instances with tens of thousands of states plus robotics and operations-research case studies.

Significance. If the monotonicity property and algorithm guarantees hold, the work offers a practical, theoretically grounded method for quantifying observation-model robustness of POMDP policies, addressing a gap between nominal design and deployment under sensor drift or calibration errors. The reduction to node-dependent perturbations for FSCs is a clean application of finite-memory arguments, and the explicit polynomial/exponential complexity bounds together with scaling experiments on large instances are concrete strengths. The bi-level root-finding approach and case studies demonstrate both algorithmic efficiency and applied relevance.

minor comments (3)
  1. [§2] §2 (Preliminaries): The distinction between sticky and non-sticky variants is introduced via informal description; a formal definition of the deviation sets (e.g., as functions of (s,a) versus histories) should appear before the bi-level formulation to avoid ambiguity for readers.
  2. [§4] §4 (Robust Interval Search): While soundness and convergence are claimed, the proof sketch for the non-sticky FSC reduction (that node-dependent observations preserve the worst-case value) would benefit from an explicit lemma stating the equivalence of the history-dependent and node-dependent inner problems.
  3. [Experiments] Experimental section: The scaling results mention instances with tens of thousands of states but do not report wall-clock times, memory usage, or direct comparison against a naive grid search or MILP baseline; adding these would strengthen the scalability claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper formulates the Policy Observation Robustness Problem as a bi-level optimization whose inner problem (worst-case value under bounded observation deviation) is monotonic in deviation magnitude. This monotonicity follows directly from the nested structure of feasible deviation sets: any deviation feasible for magnitude ε is feasible for ε' > ε, so the minimized value is non-increasing. This is a logical consequence of the problem definition using standard POMDP value functions, not a self-definition, fitted parameter, or self-citation. The non-sticky FSC reduction (limiting to node-dependent observations) is a standard finite-memory preservation argument. The Robust Interval Search algorithm's soundness, convergence, and complexity bounds are derived from these properties without reducing to prior self-work or redefinitions. The derivation is self-contained against external POMDP theory and benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption of monotonicity in the inner optimization and standard POMDP definitions; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The inner optimization over observation deviations is monotonic in the deviation magnitude.
    This property is invoked to justify root-finding in the outer loop for both sticky and non-sticky variants.

pith-pipeline@v0.9.0 · 5591 in / 1298 out tokens · 29306 ms · 2026-05-09T22:00:14.367753+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 32 canonical work pages

  1. [1]

    Turgay Ayer, Oguzhan Alagoz, and Natasha K. Stout. 2012. OR Forum—A POMDP Approach to Personalize Mammography Screening Decisions.Operations Research60, 5 (Oct. 2012), 1019–1034. https://doi.org/10.1287/opre.1110.1019 Publisher: INFORMS

  2. [2]

    Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo. 2011.Monte Carlo Value Iteration for Continuous-State POMDPs. Springer Berlin Heidelberg, Berlin, Heidelberg, 175–191. https://doi.org/10.1007/978-3-642-17452-0_11

  3. [3]

    Christel Baier, Christian Hensel, Lisa Hutschenreiter, Sebastian Junges, Joost-Pieter Katoen, and Joachim Klein. 2020. Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination.Information and Computation272 (2020), 104504. https: //doi.org/10.1016/j.ic.2019.104504 GandALF 2017

  4. [4]

    Somrita Banerjee, Edward Balaban, Mark Shirley, Kevin Bradner, and Marco Pavone. 2024. Contingency Planning Using Bi-level Markov Decision Processes for Space Missions. In2024 IEEE Aerospace Conference. 1–9. https://doi.org/10.1109/AERO58975.2024.10521281

  5. [5]

    2010.Dynamic Programming

    Richard Bellman and Stuart Dreyfus. 2010.Dynamic Programming. Vol. 33. Princeton University Press. http://www.jstor.org/stable/j. ctv1nxcw0f

  6. [6]

    Alexander Bork, Sebastian Junges, Joost-Pieter Katoen, and Tim Quatmann. 2020. Verification of Indefinite-Horizon POMDPs. In Automated Technology for Verification and Analysis, Dang Van Hung and Oleg Sokolsky (Eds.). Springer International Publishing, Cham, 288–304. https://doi.org/10.1007/978-3-030-59152-6_16

  7. [7]

    Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen

    Eline M. Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen. 2024. Imprecise probabilities meet partial observability: game semantics for robust POMDPs. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence(Jeju, Korea)(IJCAI ’24). Article 740, 10 pages. https://doi.org/10.24963/ijcai.2024/740

  8. [8]

    2001.Numerical Analysis 7th Edition

    Richard L Burden and J Douglas Faires. 2001.Numerical Analysis 7th Edition. Thomson Learning, Chapter 2

  9. [9]

    Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. 2015. Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In2015 IEEE International Conference on Robotics and Automation (ICRA). 325–330. https://doi.org/10.1109/ICRA.2015.7139019 ISSN: 1050-4729

  10. [10]

    Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, and Siddhartha Srinivasa. 2020. Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning.J. Hum.-Robot Interact.9, 2, Article 9 (Jan. 2020), 23 pages. https://doi.org/10.1145/3359616

  11. [11]

    Anthony Colaprete, Daniel Andrews, William Bluethmann, Richard C Elphic, Ben Bussey, Jay Trimble, Kris Zacny, and Janine E Captain

  12. [12]

    InAGU fall meeting abstracts, Vol

    An overview of the volatiles investigating polar exploration rover (viper) mission. InAGU fall meeting abstracts, Vol. 2019. P34B–03

  13. [13]

    Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex optimization for parameter synthesis in MDPs.IEEE Trans. Automat. Control67, 12 (2021), 6333–6348

  14. [14]

    Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, and Ufuk Topcu. 2021. Robust finite-state controllers for uncertain POMDPs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11792–11800

  15. [15]

    Conrado Daws. 2004. Symbolic and parametric model checking of discrete-time Markov chains. InInternational Colloquium on Theoretical Aspects of Computing. Springer, 280–294

  16. [16]

    Christian Dehnert, Sebastian Junges, Nils Jansen, Florian Corzilius, Matthias Volk, Harold Bruintjes, Joost-Pieter Katoen, and Erika Ábrahám. 2015. PROPhESY: A PRObabilistic ParamEter SYnthesis Tool. InComputer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Vol. 9206. Springer International Publishing, Cham, 214–231. https://doi.org/1...

  17. [17]

    Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. 2017. A Storm is Coming: A Modern Probabilistic Model Checker. InComputer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer International Publishing, Cham, 592–600. https://doi.org/10.1007/978-3-319-63390-9_31

  18. [18]

    Yunus Emre Demirci, Ali Devran Kara, and Serdar Yüksel. 2025. Sensitivity of Filter Kernels and Robustness Bounds to Transition and Measurement Kernel Perturbations in Partially Observable Stochastic Control.arXiv preprint arXiv:2508.10658(2025)

  19. [19]

    Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. 2017. POMDPs. jl: A framework for sequential decision making under uncertainty.The Journal of Machine Learning Research18, 1 (2017), 831–835

  20. [20]

    Mahdi Milani Fard, Joelle Pineau, and Peng Sun. 2008. A Variance Analysis for POMDP Policy Evaluation.Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence2 (Jan. 2008), 1056–1061. https://doi.org/10.1901/jaba.2008.2- 1056 33

  21. [21]

    Maris FL Galesloot, Marnix Suilen, Thiago D Simão, Steven Carr, Matthijs TJ Spaan, Ufuk Topcu, and Nils Jansen. 2025. Pessimistic Iterative Planning with RNNs for Robust POMDPs. InECAI 2025. IOS Press, 4823–4831

  22. [22]

    Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes.Artificial Intelligence122, 1 (Sep 2000), 71–109. https://doi.org/10.1016/S0004-3702(00)00047-3

  23. [23]

    Himanshu Gupta, Bradley Hayes, and Zachary Sunberg. 2022. Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems(Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 562–570

  24. [24]

    1997.Planning and control in stochastic domains with imperfect information

    Milos Hauskrecht. 1997.Planning and control in stochastic domains with imperfect information. Ph. D. Dissertation. Massachusetts Institute of Technology

  25. [25]

    Heldmann, Anthony Colaprete, Richard C

    Jennifer L. Heldmann, Anthony Colaprete, Richard C. Elphic, Ben Bussey, Andrew McGovern, Ross Beyer, David Lees, and Matt Deans

  26. [26]

    https://doi.org/10.1016/j.actaastro.2016.06.014

    Site selection and traverse planning to support a lunar polar rover mission: A case study at Haworth Crater.Acta Astronautica127 (2016), 308–320. https://doi.org/10.1016/j.actaastro.2016.06.014

  27. [27]

    Lisa Hutschenreiter, Christel Baier, and Joachim Klein. 2017. Parametric Markov Chains: PCTL Complexity and Fraction-free Gaussian Elimination.Electronic Proceedings in Theoretical Computer Science256 (Sept. 2017), 16–30. https://doi.org/10.4204/EPTCS.256.2 arXiv:1709.02093 [cs]

  28. [28]

    Garud N Iyengar. 2005. Robust dynamic programming.Mathematics of Operations Research30, 2 (2005), 257–280

  29. [29]

    Jonsson and K.G

    B. Jonsson and K.G. Larsen. 1991. Specification and refinement of probabilistic processes. In[1991] Proceedings Sixth Annual IEEE Symposium on Logic in Computer Science. 266–277. https://doi.org/10.1109/LICS.1991.151651

  30. [30]

    Sebastian Junges, Nils Jansen, Ralf Wimmer, Tim Quatmann, Leonore Winterer, Joost-Pieter Katoen, and Bernd Becker. 2018. Finite-state Controllers of POMDPs via Parameter Synthesis.Conference on Uncertainty in Artifical Intelligence(2018)

  31. [31]

    Sebastian Junges, Erika Ábrahám, Christian Hensel, Nils Jansen, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. 2024. Parameter synthesis for Markov models: covering the parameter space.Formal Methods in System Design62, 1 (June 2024), 181–259. https: //doi.org/10.1007/s10703-023-00442-x

  32. [32]

    Littman, and Anthony R

    Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains.Artificial Intelligence101, 1 (1998), 99–134

  33. [33]

    Ali Devran Kara and Serdar Yüksel. 2021. Robustness to Approximations and Model Learning in MDPs and POMDPs. InModern Trends in Controlled Stochastic Processes:, Alexey Piunovskiy and Yi Zhang (Eds.). Springer International Publishing, Cham, 166–191. https://doi.org/10.1007/978-3-030-76928-4_9

  34. [34]

    2015.Decision making under uncertainty: theory and application

    Mykel J Kochenderfer. 2015.Decision making under uncertainty: theory and application. MIT press, Chapter 6

  35. [35]

    Alyzia-Maria Konsta, Alberto Lluch Lafuente, and Christoph Matheja. 2024. What should be observed for optimal reward in POMDPs? http://arxiv.org/abs/2405.10768 arXiv:2405.10768 [cs]

  36. [36]

    Akritas, M

    Benjamin D. Kraske, Anshu Saksena, Anna L. Buczak, and Zachary N. Sunberg. 2023. Explanation Through Reward Model Reconciliation using POMDP Tree Search. In2023 IEEE International Conference on Assured Autonomy (ICAA). 137–140. https://doi.org/10.1109/ ICAA58325.2023.00027

  37. [37]

    Hanna Kurniawati, David Hsu, and Wee Sun Lee. 2008. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. InProc. Robotics: Science and Systems

  38. [38]

    Mikko Lauri, David Hsu, and Joni Pajarinen. 2023. Partially Observable Markov Decision Processes in Robotics: A Survey.IEEE Transactions on Robotics39, 1 (2023), 21–40. https://doi.org/10.1109/TRO.2022.3200138

  39. [39]

    Omid Madani, Steve Hanks, and Anne Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems.Artificial Intelligence147, 1 (2003), 5–34. https://doi.org/10.1016/S0004-3702(02)00378-8 Planning with Uncertainty and Incomplete Information

  40. [40]

    Andrew Mastin and Patrick Jaillet. 2012. Loss bounds for uncertain transition probabilities in Markov decision processes. In2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 6708–6715. https://doi.org/10.1109/CDC.2012.6426504 ISSN: 0743-1546

  41. [41]

    Frederik Baymler Mathiesen, Morteza Lahijanian, and Luca Laurenti. 2024. IntervalMDP.jl: Accelerated Value Iteration for Interval Markov Decision Processes.IFAC-PapersOnLine58, 11 (2024), 1–6. https://doi.org/10.1016/j.ifacol.2024.07.416 8th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 2024

  42. [42]

    Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, and Morteza Lahijanian. 2024. Piecewise stochastic barrier functions.arXiv preprint arXiv:2404.16986(2024)

  43. [43]

    Cassandra

    Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra. 1999. Solving POMDPs by searching the space of finite policies. InProceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(Stockholm, Sweden)(UAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 417–426

  44. [44]

    Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices.Operations Research53, 5 (2005), 780–798

  45. [45]

    Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems.Real-Time Systems53, 3 (2017), 354–402. 34

  46. [46]

    Wenfan Ou and Sheng Bi. 2025. Sequential decision-making under uncertainty: a robust MDPs review.Annals of Operations Research 353, 3 (2025), 1239–1285

  47. [47]

    Flight demonstration of unmanned aircraft system (UAS) traffic management (UTM) at technical capability level 3,

    Michael P. Owen, Adam Panken, Robert Moss, Luis Alvarez, and Charles Leeper. 2019. ACAS Xu: Integrated Collision Avoidance and Detect and Avoid Capability for UAS. In2019 IEEE/AIAA 38th Digital A vionics Systems Conference (DASC). 1–10. https://doi.org/10.1109/ DASC43569.2019.9081758

  48. [48]

    Papadimitriou and John N

    Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes.Mathematics of Operations Research12, 3 (1987), 441–450. http://www.jstor.org/stable/3689975

  49. [49]

    Joelle Pineau, Geoff Gordon, Sebastian Thrun, et al. 2003. Point-based value iteration: An anytime algorithm for POMDPs. InIjcai, Vol. 3. 1025–1032

  50. [50]

    2014.Markov decision processes: discrete stochastic dynamic programming

    Martin L Puterman. 2014.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons

  51. [51]

    Tim Quatmann, Christian Dehnert, Nils Jansen, Sebastian Junges, and Joost-Pieter Katoen. 2016. Parameter Synthesis for Markov Models: Faster Than Ever. InAutomated Technology for Verification and Analysis, Cyrille Artho, Axel Legay, and Doron Peled (Eds.). Springer International Publishing, Cham, 50–67

  52. [52]

    Alex Schutz, Yang You, Matías Mattamala, Ipek Caliskanelli, Bruno Lacerda, and Nick Hawes. 2025. A Finite-State Controller Based Offline Solver for Deterministic POMDPs. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, James Kwok (Ed.). International Joint Conferences on Artificial Intelligence Organi...

  53. [53]

    Prashin Sharma, Benjamin Kraske, Joseph Kim, Zakariya Laouar, Zachary Sunberg, and Ella Atkins. 2024. Risk-Aware markov decision process contingency management autonomy for uncrewed aircraft systems.Journal of aerospace information systems21, 3 (2024), 234–248

  54. [54]

    David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. InAdvances in Neural Information Processing Systems, Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html

  55. [55]

    Trey Smith and Reid Simmons. 2004. Heuristic search value iteration for POMDPs. InProceedings of the 20th conference on Uncertainty in artificial intelligence. 520–527

  56. [56]

    Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research26, 2 (1978), 282–304. http://www.jstor.org/stable/169635

  57. [57]

    Jip Spel, Sebastian Junges, and Joost-Pieter Katoen. 2021. Finding Provably Optimal Markov Chains. InTools and Algorithms for the Construction and Analysis of Systems, Jan Friso Groote and Kim Guldstrand Larsen (Eds.). Springer International Publishing, Cham, 173–190. https://doi.org/10.1007/978-3-030-72016-2_10

  58. [58]

    Marnix Suilen, Nils Jansen, Murat Cubuktepe, and Ufuk Topcu. 2020. Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4113–4120. https:/...

  59. [59]

    Chin Hon Tan and Joseph C. Hartman. 2011. Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters. Journal of Applied Probability48, 4 (2011), 954–967. https://www.jstor.org/stable/23066436 Publisher: Applied Probability Trust

  60. [60]

    White and Hany K

    Chelsea C. White and Hany K. El-Deib. 1986. Parameter Imprecision in Finite State, Finite Action Dynamic Programs.Operations Research34, 1 (1986), 120–129. https://www.jstor.org/stable/170676 Publisher: INFORMS

  61. [61]

    Nan Ye, Adhiraj Somani, David Hsu, and Wee Sun Lee. 2017. DESPOT: Online POMDP Planning with Regularization.Journal of Artificial Intelligence Research58 (Jan. 2017), 231–266. https://doi.org/10.1613/jair.5328

  62. [62]

    Serdar Yüksel and Tamás Linder. 2012. Optimization and Convergence of Observation Channels in Stochastic Control.SIAM Journal on Control and Optimization50, 2 (2012), 864–887. https://doi.org/10.1137/100808976 arXiv:https://doi.org/10.1137/100808976 35 A Proof of Theorem 3 Theorem 3.For a history-dependent policy represented as an FSC 𝜋, the history-depen...