Robustness Analysis of POMDP Policies to Observation Perturbations

Benjamin Kraske; Federico Rossi; Morteza Lahijanian; Qi Heng Ho; Zachary Sunberg

arxiv: 2604.21256 · v1 · submitted 2026-04-23 · 💻 cs.AI

Robustness Analysis of POMDP Policies to Observation Perturbations

Benjamin Kraske , Qi Heng Ho , Federico Rossi , Morteza Lahijanian , Zachary Sunberg This is my paper

Pith reviewed 2026-05-09 22:00 UTC · model grok-4.3

classification 💻 cs.AI

keywords POMDPpolicy robustnessobservation perturbationsbi-level optimizationfinite-state controllerssensor degradationrobustness analysispartially observable Markov decision process

0 comments

The pith

The largest observation-model deviation a POMDP policy can tolerate without falling below a value threshold is located by root-finding on a monotonic bi-level optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Policy Observation Robustness Problem as the task of finding the biggest allowable change to a POMDP's observation probabilities such that a fixed policy still achieves at least a chosen expected reward. It reformulates the problem as an outer search over deviation magnitudes whose inner step computes the worst-case value under that magnitude. Because the inner value is monotonic in the deviation size, ordinary root-finding methods converge to the desired tolerance. The authors distinguish a sticky variant, where deviations depend only on state and action, from a non-sticky variant that can depend on full histories, and show that finite-state controller policies reduce the non-sticky case to node-dependent observations. This matters for real deployments where sensors drift or degrade, because a policy verified only on the nominal model can otherwise produce unexpectedly poor performance.

Core claim

The Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, when policies are represented with finite-state controllers it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. The resulting Robust Interval Search algorithm is sound and convergent, with polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and scales to POMDPs with tens of thousands of states.

What carries the argument

Bi-level optimization of the Policy Observation Robustness Problem, where the inner level computes worst-case policy value for a fixed deviation magnitude and is monotonic in that magnitude.

If this is right

Robust Interval Search supplies soundness and convergence guarantees for computing the maximum tolerable deviation in both sticky and non-sticky variants.
The non-sticky variant admits a polynomial-time algorithm when policies are finite-state controllers.
The sticky variant requires at most exponential time but remains practical for large state spaces.
Experimental implementations scale to POMDP models containing tens of thousands of states.
Case studies in robotics and operations research confirm the problem formulation yields actionable robustness margins.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same monotonicity structure could be used to certify robustness when both transition and observation models are allowed to deviate simultaneously.
Designers could use the computed tolerance to set explicit sensor-calibration budgets or maintenance schedules before deployment.
Similar reductions from history-dependent to compact-memory observations may apply to other policy classes such as recurrent neural networks.
If approximate monotonicity holds for learned or black-box policies, root-finding could still provide practical robustness certificates.

Load-bearing premise

The worst-case value achieved by the policy under observation deviations decreases monotonically as the allowed size of those deviations grows.

What would settle it

A concrete counter-example in which, for some fixed policy and POMDP, increasing the maximum allowed observation deviation produces a higher (or equal) worst-case value instead of a strictly lower one.

Figures

Figures reproduced from arXiv: 2604.21256 by Benjamin Kraske, Federico Rossi, Morteza Lahijanian, Qi Heng Ho, Zachary Sunberg.

**Figure 2.** Figure 2: Rover environment and FSC [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Markov chain representations or 𝑍˜ at a given step. We then show inductively for the finite-horizon case that, given any history ℎ and state 𝑠, both the 𝑍ˆ and 𝑍˜ cases have the same value at horizon 𝑑 = 1 and minimize over the same domain, therefore achieving the same value. In the infinite-horizon case, we show both Bellman updates converge to fixed points. Given the finite-horizon equivalence of the two… view at source ↗

**Figure 4.** Figure 4: Differences in path frequencies between the nominal and worst-case (minimizing) two-step Markov chains for a case [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: Policy and results for the cancer case study. [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: FSCs for the Part Checking POMDP. Note that the initial measure node is returned to after accepting or rejecting a [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of Policy 1 and Policy 2 for the Part Checking POMDP. [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

read the original abstract

Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP's observation model that guarantees the policy's value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper defines the Policy Observation Robustness Problem for POMDPs and solves it with Robust Interval Search by exploiting monotonicity in deviation size, with a useful FSC reduction for the non-sticky case.

read the letter

The core contribution is turning the question of how much observation model deviation a policy can tolerate while staying above a value threshold into a bi-level optimization. The inner problem is monotonic in the deviation magnitude because larger deviation sets contain the smaller ones, so the worst-case value can only get worse or stay the same. That lets them apply root-finding or interval search on the outer loop. For the non-sticky variant they also show that finite-state controller policies only need node-dependent observations rather than full histories, which is a clean reduction that keeps the worst-case value intact. They give soundness and convergence guarantees for Robust Interval Search, polynomial complexity for the non-sticky version, and at most exponential for sticky, plus experiments that reach POMDPs with tens of thousands of states and some robotics and operations research case studies. The monotonicity argument is straightforward from the nested sets and the FSC reduction follows standard finite-memory reasoning, so those pieces look reliable. The experiments demonstrate practical scaling, which is the strongest evidence they provide for usability. One limitation is that the sticky variant remains exponential, so it may still hit walls on very large or dense problems even if the reported instances work. The guarantees also rest on the value function behaving monotonically under the chosen perturbation model, which holds in the abstract but would need careful checking on edge cases with approximate value functions or non-convex observation spaces. This work is aimed at POMDP researchers and practitioners in robotics or operations research who need to certify policies against sensor drift or calibration issues without re-solving from scratch. It fills a clear deployment gap with a new problem statement and an algorithm that is both analyzable and implementable. I would bring it to a reading group and send it for peer review; the formulation and complexity results are concrete enough to deserve referee scrutiny even if some proofs need tightening.

Referee Report

0 major / 3 minor

Summary. The paper introduces the Policy Observation Robustness Problem: given a POMDP policy and value threshold, find the largest deviation bound on the observation model such that the policy's value remains above the threshold under any deviation of that magnitude. It formulates the problem as a bi-level optimization (outer search over deviation magnitude, inner worst-case value computation) and proves that the inner problem is monotonic in the deviation size due to nested feasible deviation sets. For the non-sticky (history-dependent) variant, when policies are finite-state controllers, it suffices to consider node-dependent observation perturbations rather than full histories. The authors present the Robust Interval Search algorithm, prove its soundness and convergence, establish polynomial time complexity for the non-sticky case and at most exponential for the sticky case, and report experiments on instances with tens of thousands of states plus robotics and operations-research case studies.

Significance. If the monotonicity property and algorithm guarantees hold, the work offers a practical, theoretically grounded method for quantifying observation-model robustness of POMDP policies, addressing a gap between nominal design and deployment under sensor drift or calibration errors. The reduction to node-dependent perturbations for FSCs is a clean application of finite-memory arguments, and the explicit polynomial/exponential complexity bounds together with scaling experiments on large instances are concrete strengths. The bi-level root-finding approach and case studies demonstrate both algorithmic efficiency and applied relevance.

minor comments (3)

[§2] §2 (Preliminaries): The distinction between sticky and non-sticky variants is introduced via informal description; a formal definition of the deviation sets (e.g., as functions of (s,a) versus histories) should appear before the bi-level formulation to avoid ambiguity for readers.
[§4] §4 (Robust Interval Search): While soundness and convergence are claimed, the proof sketch for the non-sticky FSC reduction (that node-dependent observations preserve the worst-case value) would benefit from an explicit lemma stating the equivalence of the history-dependent and node-dependent inner problems.
[Experiments] Experimental section: The scaling results mention instances with tens of thousands of states but do not report wall-clock times, memory usage, or direct comparison against a naive grid search or MILP baseline; adding these would strengthen the scalability claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper formulates the Policy Observation Robustness Problem as a bi-level optimization whose inner problem (worst-case value under bounded observation deviation) is monotonic in deviation magnitude. This monotonicity follows directly from the nested structure of feasible deviation sets: any deviation feasible for magnitude ε is feasible for ε' > ε, so the minimized value is non-increasing. This is a logical consequence of the problem definition using standard POMDP value functions, not a self-definition, fitted parameter, or self-citation. The non-sticky FSC reduction (limiting to node-dependent observations) is a standard finite-memory preservation argument. The Robust Interval Search algorithm's soundness, convergence, and complexity bounds are derived from these properties without reducing to prior self-work or redefinitions. The derivation is self-contained against external POMDP theory and benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption of monotonicity in the inner optimization and standard POMDP definitions; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The inner optimization over observation deviations is monotonic in the deviation magnitude.
This property is invoked to justify root-finding in the outer loop for both sticky and non-sticky variants.

pith-pipeline@v0.9.0 · 5591 in / 1298 out tokens · 29306 ms · 2026-05-09T22:00:14.367753+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 32 canonical work pages

[1]

Turgay Ayer, Oguzhan Alagoz, and Natasha K. Stout. 2012. OR Forum—A POMDP Approach to Personalize Mammography Screening Decisions.Operations Research60, 5 (Oct. 2012), 1019–1034. https://doi.org/10.1287/opre.1110.1019 Publisher: INFORMS

work page doi:10.1287/opre.1110.1019 2012
[2]

Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo. 2011.Monte Carlo Value Iteration for Continuous-State POMDPs. Springer Berlin Heidelberg, Berlin, Heidelberg, 175–191. https://doi.org/10.1007/978-3-642-17452-0_11

work page doi:10.1007/978-3-642-17452-0_11 2011
[3]

Christel Baier, Christian Hensel, Lisa Hutschenreiter, Sebastian Junges, Joost-Pieter Katoen, and Joachim Klein. 2020. Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination.Information and Computation272 (2020), 104504. https: //doi.org/10.1016/j.ic.2019.104504 GandALF 2017

work page doi:10.1016/j.ic.2019.104504 2020
[4]

Somrita Banerjee, Edward Balaban, Mark Shirley, Kevin Bradner, and Marco Pavone. 2024. Contingency Planning Using Bi-level Markov Decision Processes for Space Missions. In2024 IEEE Aerospace Conference. 1–9. https://doi.org/10.1109/AERO58975.2024.10521281

work page doi:10.1109/aero58975.2024.10521281 2024
[5]

2010.Dynamic Programming

Richard Bellman and Stuart Dreyfus. 2010.Dynamic Programming. Vol. 33. Princeton University Press. http://www.jstor.org/stable/j. ctv1nxcw0f

2010
[6]

Alexander Bork, Sebastian Junges, Joost-Pieter Katoen, and Tim Quatmann. 2020. Verification of Indefinite-Horizon POMDPs. In Automated Technology for Verification and Analysis, Dang Van Hung and Oleg Sokolsky (Eds.). Springer International Publishing, Cham, 288–304. https://doi.org/10.1007/978-3-030-59152-6_16

work page doi:10.1007/978-3-030-59152-6_16 2020
[7]

Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen

Eline M. Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen. 2024. Imprecise probabilities meet partial observability: game semantics for robust POMDPs. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence(Jeju, Korea)(IJCAI ’24). Article 740, 10 pages. https://doi.org/10.24963/ijcai.2024/740

work page doi:10.24963/ijcai.2024/740 2024
[8]

2001.Numerical Analysis 7th Edition

Richard L Burden and J Douglas Faires. 2001.Numerical Analysis 7th Edition. Thomson Learning, Chapter 2

2001
[9]

Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. 2015. Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In2015 IEEE International Conference on Robotics and Automation (ICRA). 325–330. https://doi.org/10.1109/ICRA.2015.7139019 ISSN: 1050-4729

work page doi:10.1109/icra.2015.7139019 2015
[10]

Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, and Siddhartha Srinivasa. 2020. Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning.J. Hum.-Robot Interact.9, 2, Article 9 (Jan. 2020), 23 pages. https://doi.org/10.1145/3359616

work page doi:10.1145/3359616 2020
[11]

Anthony Colaprete, Daniel Andrews, William Bluethmann, Richard C Elphic, Ben Bussey, Jay Trimble, Kris Zacny, and Janine E Captain
[12]

InAGU fall meeting abstracts, Vol

An overview of the volatiles investigating polar exploration rover (viper) mission. InAGU fall meeting abstracts, Vol. 2019. P34B–03

2019
[13]

Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex optimization for parameter synthesis in MDPs.IEEE Trans. Automat. Control67, 12 (2021), 6333–6348

2021
[14]

Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, and Ufuk Topcu. 2021. Robust finite-state controllers for uncertain POMDPs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11792–11800

2021
[15]

Conrado Daws. 2004. Symbolic and parametric model checking of discrete-time Markov chains. InInternational Colloquium on Theoretical Aspects of Computing. Springer, 280–294

2004
[16]

Christian Dehnert, Sebastian Junges, Nils Jansen, Florian Corzilius, Matthias Volk, Harold Bruintjes, Joost-Pieter Katoen, and Erika Ábrahám. 2015. PROPhESY: A PRObabilistic ParamEter SYnthesis Tool. InComputer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Vol. 9206. Springer International Publishing, Cham, 214–231. https://doi.org/1...

work page doi:10.1007/978-3-319-21690-4_13 2015
[17]

Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. 2017. A Storm is Coming: A Modern Probabilistic Model Checker. InComputer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer International Publishing, Cham, 592–600. https://doi.org/10.1007/978-3-319-63390-9_31

work page doi:10.1007/978-3-319-63390-9_31 2017
[18]

Yunus Emre Demirci, Ali Devran Kara, and Serdar Yüksel. 2025. Sensitivity of Filter Kernels and Robustness Bounds to Transition and Measurement Kernel Perturbations in Partially Observable Stochastic Control.arXiv preprint arXiv:2508.10658(2025)

work page arXiv 2025
[19]

Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. 2017. POMDPs. jl: A framework for sequential decision making under uncertainty.The Journal of Machine Learning Research18, 1 (2017), 831–835

2017
[20]

Mahdi Milani Fard, Joelle Pineau, and Peng Sun. 2008. A Variance Analysis for POMDP Policy Evaluation.Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence2 (Jan. 2008), 1056–1061. https://doi.org/10.1901/jaba.2008.2- 1056 33

work page doi:10.1901/jaba.2008.2- 2008
[21]

Maris FL Galesloot, Marnix Suilen, Thiago D Simão, Steven Carr, Matthijs TJ Spaan, Ufuk Topcu, and Nils Jansen. 2025. Pessimistic Iterative Planning with RNNs for Robust POMDPs. InECAI 2025. IOS Press, 4823–4831

2025
[22]

Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes.Artificial Intelligence122, 1 (Sep 2000), 71–109. https://doi.org/10.1016/S0004-3702(00)00047-3

work page doi:10.1016/s0004-3702(00)00047-3 2000
[23]

Himanshu Gupta, Bradley Hayes, and Zachary Sunberg. 2022. Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems(Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 562–570

2022
[24]

1997.Planning and control in stochastic domains with imperfect information

Milos Hauskrecht. 1997.Planning and control in stochastic domains with imperfect information. Ph. D. Dissertation. Massachusetts Institute of Technology

1997
[25]

Heldmann, Anthony Colaprete, Richard C

Jennifer L. Heldmann, Anthony Colaprete, Richard C. Elphic, Ben Bussey, Andrew McGovern, Ross Beyer, David Lees, and Matt Deans
[26]

https://doi.org/10.1016/j.actaastro.2016.06.014

Site selection and traverse planning to support a lunar polar rover mission: A case study at Haworth Crater.Acta Astronautica127 (2016), 308–320. https://doi.org/10.1016/j.actaastro.2016.06.014

work page doi:10.1016/j.actaastro.2016.06.014 2016
[27]

Lisa Hutschenreiter, Christel Baier, and Joachim Klein. 2017. Parametric Markov Chains: PCTL Complexity and Fraction-free Gaussian Elimination.Electronic Proceedings in Theoretical Computer Science256 (Sept. 2017), 16–30. https://doi.org/10.4204/EPTCS.256.2 arXiv:1709.02093 [cs]

work page doi:10.4204/eptcs.256.2 2017
[28]

Garud N Iyengar. 2005. Robust dynamic programming.Mathematics of Operations Research30, 2 (2005), 257–280

2005
[29]

Jonsson and K.G

B. Jonsson and K.G. Larsen. 1991. Specification and refinement of probabilistic processes. In[1991] Proceedings Sixth Annual IEEE Symposium on Logic in Computer Science. 266–277. https://doi.org/10.1109/LICS.1991.151651

work page doi:10.1109/lics.1991.151651 1991
[30]

Sebastian Junges, Nils Jansen, Ralf Wimmer, Tim Quatmann, Leonore Winterer, Joost-Pieter Katoen, and Bernd Becker. 2018. Finite-state Controllers of POMDPs via Parameter Synthesis.Conference on Uncertainty in Artifical Intelligence(2018)

2018
[31]

Sebastian Junges, Erika Ábrahám, Christian Hensel, Nils Jansen, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. 2024. Parameter synthesis for Markov models: covering the parameter space.Formal Methods in System Design62, 1 (June 2024), 181–259. https: //doi.org/10.1007/s10703-023-00442-x

work page doi:10.1007/s10703-023-00442-x 2024
[32]

Littman, and Anthony R

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains.Artificial Intelligence101, 1 (1998), 99–134

1998
[33]

Ali Devran Kara and Serdar Yüksel. 2021. Robustness to Approximations and Model Learning in MDPs and POMDPs. InModern Trends in Controlled Stochastic Processes:, Alexey Piunovskiy and Yi Zhang (Eds.). Springer International Publishing, Cham, 166–191. https://doi.org/10.1007/978-3-030-76928-4_9

work page doi:10.1007/978-3-030-76928-4_9 2021
[34]

2015.Decision making under uncertainty: theory and application

Mykel J Kochenderfer. 2015.Decision making under uncertainty: theory and application. MIT press, Chapter 6

2015
[35]

Alyzia-Maria Konsta, Alberto Lluch Lafuente, and Christoph Matheja. 2024. What should be observed for optimal reward in POMDPs? http://arxiv.org/abs/2405.10768 arXiv:2405.10768 [cs]

work page arXiv 2024
[36]

Akritas, M

Benjamin D. Kraske, Anshu Saksena, Anna L. Buczak, and Zachary N. Sunberg. 2023. Explanation Through Reward Model Reconciliation using POMDP Tree Search. In2023 IEEE International Conference on Assured Autonomy (ICAA). 137–140. https://doi.org/10.1109/ ICAA58325.2023.00027

work page arXiv 2023
[37]

Hanna Kurniawati, David Hsu, and Wee Sun Lee. 2008. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. InProc. Robotics: Science and Systems

2008
[38]

Mikko Lauri, David Hsu, and Joni Pajarinen. 2023. Partially Observable Markov Decision Processes in Robotics: A Survey.IEEE Transactions on Robotics39, 1 (2023), 21–40. https://doi.org/10.1109/TRO.2022.3200138

work page doi:10.1109/tro.2022.3200138 2023
[39]

Omid Madani, Steve Hanks, and Anne Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems.Artificial Intelligence147, 1 (2003), 5–34. https://doi.org/10.1016/S0004-3702(02)00378-8 Planning with Uncertainty and Incomplete Information

work page doi:10.1016/s0004-3702(02)00378-8 2003
[40]

Andrew Mastin and Patrick Jaillet. 2012. Loss bounds for uncertain transition probabilities in Markov decision processes. In2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 6708–6715. https://doi.org/10.1109/CDC.2012.6426504 ISSN: 0743-1546

work page doi:10.1109/cdc.2012.6426504 2012
[41]

Frederik Baymler Mathiesen, Morteza Lahijanian, and Luca Laurenti. 2024. IntervalMDP.jl: Accelerated Value Iteration for Interval Markov Decision Processes.IFAC-PapersOnLine58, 11 (2024), 1–6. https://doi.org/10.1016/j.ifacol.2024.07.416 8th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 2024

work page doi:10.1016/j.ifacol.2024.07.416 2024
[42]

Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, and Morteza Lahijanian. 2024. Piecewise stochastic barrier functions.arXiv preprint arXiv:2404.16986(2024)

work page arXiv 2024
[43]

Cassandra

Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra. 1999. Solving POMDPs by searching the space of finite policies. InProceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(Stockholm, Sweden)(UAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 417–426

1999
[44]

Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices.Operations Research53, 5 (2005), 780–798

2005
[45]

Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems.Real-Time Systems53, 3 (2017), 354–402. 34

2017
[46]

Wenfan Ou and Sheng Bi. 2025. Sequential decision-making under uncertainty: a robust MDPs review.Annals of Operations Research 353, 3 (2025), 1239–1285

2025
[47]

Flight demonstration of unmanned aircraft system (UAS) traffic management (UTM) at technical capability level 3,

Michael P. Owen, Adam Panken, Robert Moss, Luis Alvarez, and Charles Leeper. 2019. ACAS Xu: Integrated Collision Avoidance and Detect and Avoid Capability for UAS. In2019 IEEE/AIAA 38th Digital A vionics Systems Conference (DASC). 1–10. https://doi.org/10.1109/ DASC43569.2019.9081758

work page arXiv 2019
[48]

Papadimitriou and John N

Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes.Mathematics of Operations Research12, 3 (1987), 441–450. http://www.jstor.org/stable/3689975

work page arXiv 1987
[49]

Joelle Pineau, Geoff Gordon, Sebastian Thrun, et al. 2003. Point-based value iteration: An anytime algorithm for POMDPs. InIjcai, Vol. 3. 1025–1032

2003
[50]

2014.Markov decision processes: discrete stochastic dynamic programming

Martin L Puterman. 2014.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons

2014
[51]

Tim Quatmann, Christian Dehnert, Nils Jansen, Sebastian Junges, and Joost-Pieter Katoen. 2016. Parameter Synthesis for Markov Models: Faster Than Ever. InAutomated Technology for Verification and Analysis, Cyrille Artho, Axel Legay, and Doron Peled (Eds.). Springer International Publishing, Cham, 50–67

2016
[52]

Alex Schutz, Yang You, Matías Mattamala, Ipek Caliskanelli, Bruno Lacerda, and Nick Hawes. 2025. A Finite-State Controller Based Offline Solver for Deterministic POMDPs. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, James Kwok (Ed.). International Joint Conferences on Artificial Intelligence Organi...

2025
[53]

Prashin Sharma, Benjamin Kraske, Joseph Kim, Zakariya Laouar, Zachary Sunberg, and Ella Atkins. 2024. Risk-Aware markov decision process contingency management autonomy for uncrewed aircraft systems.Journal of aerospace information systems21, 3 (2024), 234–248

2024
[54]

David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. InAdvances in Neural Information Processing Systems, Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html

2010
[55]

Trey Smith and Reid Simmons. 2004. Heuristic search value iteration for POMDPs. InProceedings of the 20th conference on Uncertainty in artificial intelligence. 520–527

2004
[56]

Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research26, 2 (1978), 282–304. http://www.jstor.org/stable/169635

1978
[57]

Jip Spel, Sebastian Junges, and Joost-Pieter Katoen. 2021. Finding Provably Optimal Markov Chains. InTools and Algorithms for the Construction and Analysis of Systems, Jan Friso Groote and Kim Guldstrand Larsen (Eds.). Springer International Publishing, Cham, 173–190. https://doi.org/10.1007/978-3-030-72016-2_10

work page doi:10.1007/978-3-030-72016-2_10 2021
[58]

Marnix Suilen, Nils Jansen, Murat Cubuktepe, and Ufuk Topcu. 2020. Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4113–4120. https:/...

work page doi:10.24963/ijcai.2020/569 2020
[59]

Chin Hon Tan and Joseph C. Hartman. 2011. Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters. Journal of Applied Probability48, 4 (2011), 954–967. https://www.jstor.org/stable/23066436 Publisher: Applied Probability Trust

work page arXiv 2011
[60]

White and Hany K

Chelsea C. White and Hany K. El-Deib. 1986. Parameter Imprecision in Finite State, Finite Action Dynamic Programs.Operations Research34, 1 (1986), 120–129. https://www.jstor.org/stable/170676 Publisher: INFORMS

1986
[61]

Nan Ye, Adhiraj Somani, David Hsu, and Wee Sun Lee. 2017. DESPOT: Online POMDP Planning with Regularization.Journal of Artificial Intelligence Research58 (Jan. 2017), 231–266. https://doi.org/10.1613/jair.5328

work page doi:10.1613/jair.5328 2017
[62]

Serdar Yüksel and Tamás Linder. 2012. Optimization and Convergence of Observation Channels in Stochastic Control.SIAM Journal on Control and Optimization50, 2 (2012), 864–887. https://doi.org/10.1137/100808976 arXiv:https://doi.org/10.1137/100808976 35 A Proof of Theorem 3 Theorem 3.For a history-dependent policy represented as an FSC 𝜋, the history-depen...

work page doi:10.1137/100808976 2012

[1] [1]

Turgay Ayer, Oguzhan Alagoz, and Natasha K. Stout. 2012. OR Forum—A POMDP Approach to Personalize Mammography Screening Decisions.Operations Research60, 5 (Oct. 2012), 1019–1034. https://doi.org/10.1287/opre.1110.1019 Publisher: INFORMS

work page doi:10.1287/opre.1110.1019 2012

[2] [2]

Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo. 2011.Monte Carlo Value Iteration for Continuous-State POMDPs. Springer Berlin Heidelberg, Berlin, Heidelberg, 175–191. https://doi.org/10.1007/978-3-642-17452-0_11

work page doi:10.1007/978-3-642-17452-0_11 2011

[3] [3]

Christel Baier, Christian Hensel, Lisa Hutschenreiter, Sebastian Junges, Joost-Pieter Katoen, and Joachim Klein. 2020. Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination.Information and Computation272 (2020), 104504. https: //doi.org/10.1016/j.ic.2019.104504 GandALF 2017

work page doi:10.1016/j.ic.2019.104504 2020

[4] [4]

Somrita Banerjee, Edward Balaban, Mark Shirley, Kevin Bradner, and Marco Pavone. 2024. Contingency Planning Using Bi-level Markov Decision Processes for Space Missions. In2024 IEEE Aerospace Conference. 1–9. https://doi.org/10.1109/AERO58975.2024.10521281

work page doi:10.1109/aero58975.2024.10521281 2024

[5] [5]

2010.Dynamic Programming

Richard Bellman and Stuart Dreyfus. 2010.Dynamic Programming. Vol. 33. Princeton University Press. http://www.jstor.org/stable/j. ctv1nxcw0f

2010

[6] [6]

Alexander Bork, Sebastian Junges, Joost-Pieter Katoen, and Tim Quatmann. 2020. Verification of Indefinite-Horizon POMDPs. In Automated Technology for Verification and Analysis, Dang Van Hung and Oleg Sokolsky (Eds.). Springer International Publishing, Cham, 288–304. https://doi.org/10.1007/978-3-030-59152-6_16

work page doi:10.1007/978-3-030-59152-6_16 2020

[7] [7]

Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen

Eline M. Bovy, Marnix Suilen, Sebastian Junges, and Nils Jansen. 2024. Imprecise probabilities meet partial observability: game semantics for robust POMDPs. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence(Jeju, Korea)(IJCAI ’24). Article 740, 10 pages. https://doi.org/10.24963/ijcai.2024/740

work page doi:10.24963/ijcai.2024/740 2024

[8] [8]

2001.Numerical Analysis 7th Edition

Richard L Burden and J Douglas Faires. 2001.Numerical Analysis 7th Edition. Thomson Learning, Chapter 2

2001

[9] [9]

Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. 2015. Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In2015 IEEE International Conference on Robotics and Automation (ICRA). 325–330. https://doi.org/10.1109/ICRA.2015.7139019 ISSN: 1050-4729

work page doi:10.1109/icra.2015.7139019 2015

[10] [10]

Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, and Siddhartha Srinivasa. 2020. Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning.J. Hum.-Robot Interact.9, 2, Article 9 (Jan. 2020), 23 pages. https://doi.org/10.1145/3359616

work page doi:10.1145/3359616 2020

[11] [11]

Anthony Colaprete, Daniel Andrews, William Bluethmann, Richard C Elphic, Ben Bussey, Jay Trimble, Kris Zacny, and Janine E Captain

[12] [12]

InAGU fall meeting abstracts, Vol

An overview of the volatiles investigating polar exploration rover (viper) mission. InAGU fall meeting abstracts, Vol. 2019. P34B–03

2019

[13] [13]

Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex optimization for parameter synthesis in MDPs.IEEE Trans. Automat. Control67, 12 (2021), 6333–6348

2021

[14] [14]

Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, and Ufuk Topcu. 2021. Robust finite-state controllers for uncertain POMDPs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11792–11800

2021

[15] [15]

Conrado Daws. 2004. Symbolic and parametric model checking of discrete-time Markov chains. InInternational Colloquium on Theoretical Aspects of Computing. Springer, 280–294

2004

[16] [16]

Christian Dehnert, Sebastian Junges, Nils Jansen, Florian Corzilius, Matthias Volk, Harold Bruintjes, Joost-Pieter Katoen, and Erika Ábrahám. 2015. PROPhESY: A PRObabilistic ParamEter SYnthesis Tool. InComputer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Vol. 9206. Springer International Publishing, Cham, 214–231. https://doi.org/1...

work page doi:10.1007/978-3-319-21690-4_13 2015

[17] [17]

Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. 2017. A Storm is Coming: A Modern Probabilistic Model Checker. InComputer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer International Publishing, Cham, 592–600. https://doi.org/10.1007/978-3-319-63390-9_31

work page doi:10.1007/978-3-319-63390-9_31 2017

[18] [18]

Yunus Emre Demirci, Ali Devran Kara, and Serdar Yüksel. 2025. Sensitivity of Filter Kernels and Robustness Bounds to Transition and Measurement Kernel Perturbations in Partially Observable Stochastic Control.arXiv preprint arXiv:2508.10658(2025)

work page arXiv 2025

[19] [19]

Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. 2017. POMDPs. jl: A framework for sequential decision making under uncertainty.The Journal of Machine Learning Research18, 1 (2017), 831–835

2017

[20] [20]

Mahdi Milani Fard, Joelle Pineau, and Peng Sun. 2008. A Variance Analysis for POMDP Policy Evaluation.Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence2 (Jan. 2008), 1056–1061. https://doi.org/10.1901/jaba.2008.2- 1056 33

work page doi:10.1901/jaba.2008.2- 2008

[21] [21]

Maris FL Galesloot, Marnix Suilen, Thiago D Simão, Steven Carr, Matthijs TJ Spaan, Ufuk Topcu, and Nils Jansen. 2025. Pessimistic Iterative Planning with RNNs for Robust POMDPs. InECAI 2025. IOS Press, 4823–4831

2025

[22] [22]

Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes.Artificial Intelligence122, 1 (Sep 2000), 71–109. https://doi.org/10.1016/S0004-3702(00)00047-3

work page doi:10.1016/s0004-3702(00)00047-3 2000

[23] [23]

Himanshu Gupta, Bradley Hayes, and Zachary Sunberg. 2022. Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems(Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 562–570

2022

[24] [24]

1997.Planning and control in stochastic domains with imperfect information

Milos Hauskrecht. 1997.Planning and control in stochastic domains with imperfect information. Ph. D. Dissertation. Massachusetts Institute of Technology

1997

[25] [25]

Heldmann, Anthony Colaprete, Richard C

Jennifer L. Heldmann, Anthony Colaprete, Richard C. Elphic, Ben Bussey, Andrew McGovern, Ross Beyer, David Lees, and Matt Deans

[26] [26]

https://doi.org/10.1016/j.actaastro.2016.06.014

Site selection and traverse planning to support a lunar polar rover mission: A case study at Haworth Crater.Acta Astronautica127 (2016), 308–320. https://doi.org/10.1016/j.actaastro.2016.06.014

work page doi:10.1016/j.actaastro.2016.06.014 2016

[27] [27]

Lisa Hutschenreiter, Christel Baier, and Joachim Klein. 2017. Parametric Markov Chains: PCTL Complexity and Fraction-free Gaussian Elimination.Electronic Proceedings in Theoretical Computer Science256 (Sept. 2017), 16–30. https://doi.org/10.4204/EPTCS.256.2 arXiv:1709.02093 [cs]

work page doi:10.4204/eptcs.256.2 2017

[28] [28]

Garud N Iyengar. 2005. Robust dynamic programming.Mathematics of Operations Research30, 2 (2005), 257–280

2005

[29] [29]

Jonsson and K.G

B. Jonsson and K.G. Larsen. 1991. Specification and refinement of probabilistic processes. In[1991] Proceedings Sixth Annual IEEE Symposium on Logic in Computer Science. 266–277. https://doi.org/10.1109/LICS.1991.151651

work page doi:10.1109/lics.1991.151651 1991

[30] [30]

Sebastian Junges, Nils Jansen, Ralf Wimmer, Tim Quatmann, Leonore Winterer, Joost-Pieter Katoen, and Bernd Becker. 2018. Finite-state Controllers of POMDPs via Parameter Synthesis.Conference on Uncertainty in Artifical Intelligence(2018)

2018

[31] [31]

Sebastian Junges, Erika Ábrahám, Christian Hensel, Nils Jansen, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. 2024. Parameter synthesis for Markov models: covering the parameter space.Formal Methods in System Design62, 1 (June 2024), 181–259. https: //doi.org/10.1007/s10703-023-00442-x

work page doi:10.1007/s10703-023-00442-x 2024

[32] [32]

Littman, and Anthony R

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains.Artificial Intelligence101, 1 (1998), 99–134

1998

[33] [33]

Ali Devran Kara and Serdar Yüksel. 2021. Robustness to Approximations and Model Learning in MDPs and POMDPs. InModern Trends in Controlled Stochastic Processes:, Alexey Piunovskiy and Yi Zhang (Eds.). Springer International Publishing, Cham, 166–191. https://doi.org/10.1007/978-3-030-76928-4_9

work page doi:10.1007/978-3-030-76928-4_9 2021

[34] [34]

2015.Decision making under uncertainty: theory and application

Mykel J Kochenderfer. 2015.Decision making under uncertainty: theory and application. MIT press, Chapter 6

2015

[35] [35]

Alyzia-Maria Konsta, Alberto Lluch Lafuente, and Christoph Matheja. 2024. What should be observed for optimal reward in POMDPs? http://arxiv.org/abs/2405.10768 arXiv:2405.10768 [cs]

work page arXiv 2024

[36] [36]

Akritas, M

Benjamin D. Kraske, Anshu Saksena, Anna L. Buczak, and Zachary N. Sunberg. 2023. Explanation Through Reward Model Reconciliation using POMDP Tree Search. In2023 IEEE International Conference on Assured Autonomy (ICAA). 137–140. https://doi.org/10.1109/ ICAA58325.2023.00027

work page arXiv 2023

[37] [37]

Hanna Kurniawati, David Hsu, and Wee Sun Lee. 2008. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. InProc. Robotics: Science and Systems

2008

[38] [38]

Mikko Lauri, David Hsu, and Joni Pajarinen. 2023. Partially Observable Markov Decision Processes in Robotics: A Survey.IEEE Transactions on Robotics39, 1 (2023), 21–40. https://doi.org/10.1109/TRO.2022.3200138

work page doi:10.1109/tro.2022.3200138 2023

[39] [39]

Omid Madani, Steve Hanks, and Anne Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems.Artificial Intelligence147, 1 (2003), 5–34. https://doi.org/10.1016/S0004-3702(02)00378-8 Planning with Uncertainty and Incomplete Information

work page doi:10.1016/s0004-3702(02)00378-8 2003

[40] [40]

Andrew Mastin and Patrick Jaillet. 2012. Loss bounds for uncertain transition probabilities in Markov decision processes. In2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 6708–6715. https://doi.org/10.1109/CDC.2012.6426504 ISSN: 0743-1546

work page doi:10.1109/cdc.2012.6426504 2012

[41] [41]

Frederik Baymler Mathiesen, Morteza Lahijanian, and Luca Laurenti. 2024. IntervalMDP.jl: Accelerated Value Iteration for Interval Markov Decision Processes.IFAC-PapersOnLine58, 11 (2024), 1–6. https://doi.org/10.1016/j.ifacol.2024.07.416 8th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 2024

work page doi:10.1016/j.ifacol.2024.07.416 2024

[42] [42]

Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, and Morteza Lahijanian. 2024. Piecewise stochastic barrier functions.arXiv preprint arXiv:2404.16986(2024)

work page arXiv 2024

[43] [43]

Cassandra

Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra. 1999. Solving POMDPs by searching the space of finite policies. InProceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(Stockholm, Sweden)(UAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 417–426

1999

[44] [44]

Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices.Operations Research53, 5 (2005), 780–798

2005

[45] [45]

Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems.Real-Time Systems53, 3 (2017), 354–402. 34

2017

[46] [46]

Wenfan Ou and Sheng Bi. 2025. Sequential decision-making under uncertainty: a robust MDPs review.Annals of Operations Research 353, 3 (2025), 1239–1285

2025

[47] [47]

Flight demonstration of unmanned aircraft system (UAS) traffic management (UTM) at technical capability level 3,

Michael P. Owen, Adam Panken, Robert Moss, Luis Alvarez, and Charles Leeper. 2019. ACAS Xu: Integrated Collision Avoidance and Detect and Avoid Capability for UAS. In2019 IEEE/AIAA 38th Digital A vionics Systems Conference (DASC). 1–10. https://doi.org/10.1109/ DASC43569.2019.9081758

work page arXiv 2019

[48] [48]

Papadimitriou and John N

Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes.Mathematics of Operations Research12, 3 (1987), 441–450. http://www.jstor.org/stable/3689975

work page arXiv 1987

[49] [49]

Joelle Pineau, Geoff Gordon, Sebastian Thrun, et al. 2003. Point-based value iteration: An anytime algorithm for POMDPs. InIjcai, Vol. 3. 1025–1032

2003

[50] [50]

2014.Markov decision processes: discrete stochastic dynamic programming

Martin L Puterman. 2014.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons

2014

[51] [51]

Tim Quatmann, Christian Dehnert, Nils Jansen, Sebastian Junges, and Joost-Pieter Katoen. 2016. Parameter Synthesis for Markov Models: Faster Than Ever. InAutomated Technology for Verification and Analysis, Cyrille Artho, Axel Legay, and Doron Peled (Eds.). Springer International Publishing, Cham, 50–67

2016

[52] [52]

Alex Schutz, Yang You, Matías Mattamala, Ipek Caliskanelli, Bruno Lacerda, and Nick Hawes. 2025. A Finite-State Controller Based Offline Solver for Deterministic POMDPs. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, James Kwok (Ed.). International Joint Conferences on Artificial Intelligence Organi...

2025

[53] [53]

Prashin Sharma, Benjamin Kraske, Joseph Kim, Zakariya Laouar, Zachary Sunberg, and Ella Atkins. 2024. Risk-Aware markov decision process contingency management autonomy for uncrewed aircraft systems.Journal of aerospace information systems21, 3 (2024), 234–248

2024

[54] [54]

David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. InAdvances in Neural Information Processing Systems, Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html

2010

[55] [55]

Trey Smith and Reid Simmons. 2004. Heuristic search value iteration for POMDPs. InProceedings of the 20th conference on Uncertainty in artificial intelligence. 520–527

2004

[56] [56]

Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research26, 2 (1978), 282–304. http://www.jstor.org/stable/169635

1978

[57] [57]

Jip Spel, Sebastian Junges, and Joost-Pieter Katoen. 2021. Finding Provably Optimal Markov Chains. InTools and Algorithms for the Construction and Analysis of Systems, Jan Friso Groote and Kim Guldstrand Larsen (Eds.). Springer International Publishing, Cham, 173–190. https://doi.org/10.1007/978-3-030-72016-2_10

work page doi:10.1007/978-3-030-72016-2_10 2021

[58] [58]

Marnix Suilen, Nils Jansen, Murat Cubuktepe, and Ufuk Topcu. 2020. Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4113–4120. https:/...

work page doi:10.24963/ijcai.2020/569 2020

[59] [59]

Chin Hon Tan and Joseph C. Hartman. 2011. Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters. Journal of Applied Probability48, 4 (2011), 954–967. https://www.jstor.org/stable/23066436 Publisher: Applied Probability Trust

work page arXiv 2011

[60] [60]

White and Hany K

Chelsea C. White and Hany K. El-Deib. 1986. Parameter Imprecision in Finite State, Finite Action Dynamic Programs.Operations Research34, 1 (1986), 120–129. https://www.jstor.org/stable/170676 Publisher: INFORMS

1986

[61] [61]

Nan Ye, Adhiraj Somani, David Hsu, and Wee Sun Lee. 2017. DESPOT: Online POMDP Planning with Regularization.Journal of Artificial Intelligence Research58 (Jan. 2017), 231–266. https://doi.org/10.1613/jair.5328

work page doi:10.1613/jair.5328 2017

[62] [62]

Serdar Yüksel and Tamás Linder. 2012. Optimization and Convergence of Observation Channels in Stochastic Control.SIAM Journal on Control and Optimization50, 2 (2012), 864–887. https://doi.org/10.1137/100808976 arXiv:https://doi.org/10.1137/100808976 35 A Proof of Theorem 3 Theorem 3.For a history-dependent policy represented as an FSC 𝜋, the history-depen...

work page doi:10.1137/100808976 2012